Description

This plugin is a classic web spider, it will request a URL and extract all links and forms from the response. Three configurable parameter exist:

  • only_forward
  • ignoreRegex
  • followRegex

IgnoreRegex and followRegex are commonly used to configure the web_spider to spider all URLs except the “logout” or some other more exciting link like “Reboot Appliance” that would make the w3af run finish without the expected result. By default ignoreRegex is an empty string (nothing is ignored) and followRegex is ‘.*’ (everything is followed). Both regular expressions are normal regular expressions that are compiled with the python’s re module. The regular expressions are applied to the URLs that are found using the match function.

Plugin type

Crawl

Options

Name Type Default Value Description Help
only_forward boolean False When spidering, only search directories inside the one that was given as target No detailed help available
followRegex regex .* When spidering, only follow links that match this regular expression (ignoreRegex has precedence over followRegex) No detailed help available
ignoreRegex regex When spidering, DO NOT follow links that match this regular expression (has precedence over followRegex) No detailed help available

Source

For more information about this plugin and the associated tests, there’s always the source code to understand exactly what’s under the hood:
github-logoPlugin source code
Unittest source code

Dependencies

This plugin has no dependencies.