Description
This plugin is a classic web spider, it will request a URL and extract all links and forms from the response. Three configurable parameter exist:
- only_forward
- ignoreRegex
- followRegex
IgnoreRegex and followRegex are commonly used to configure the web_spider to spider all URLs except the “logout” or some other more exciting link like “Reboot Appliance” that would make the w3af run finish without the expected result. By default ignoreRegex is an empty string (nothing is ignored) and followRegex is ‘.*’ (everything is followed). Both regular expressions are normal regular expressions that are compiled with the python’s re module. The regular expressions are applied to the URLs that are found using the match function.
Plugin type
Options
Name | Type | Default Value | Description | Help |
only_forward | boolean | False | When spidering, only search directories inside the one that was given as target | No detailed help available |
followRegex | regex | .* | When spidering, only follow links that match this regular expression (ignoreRegex has precedence over followRegex) | No detailed help available |
ignoreRegex | regex | When spidering, DO NOT follow links that match this regular expression (has precedence over followRegex) | No detailed help available |
Source
For more information about this plugin and the associated tests, there’s always the source code to understand exactly what’s under the hood:
Plugin source code
Unittest source code
Dependencies
This plugin has no dependencies.