nyawc.helpers.DebugHelper.
DebugHelper
[source]¶Bases: object
A helper for printing debug messages.
output
(options, message)[source]¶Print the given message if the debug option in the given options is on.
Parameters: |
|
---|
setup
(options)[source]¶Initialize debug/logging in third party libraries correctly.
Parameters: | options (nyawc.Options ) – The options to use for the current crawling runtime. |
---|
nyawc.helpers.HTTPRequestHelper.
HTTPRequestHelper
[source]¶Bases: object
A helper for the src.http.Request module.
complies_with_scope
(queue_item, new_request, scope)[source]¶Check if the new request complies with the crawling scope.
Parameters: |
|
---|---|
Returns: | True if it complies, False otherwise. |
Return type: | bool |
Convert a requests cookie jar to a HTTP request cookie header value.
Parameters: | queue_item (nyawc.QueueItem ) – The parent queue item of the new request. |
---|---|
Returns: | The HTTP cookie header value. |
Return type: | str |
patch_with_options
(request, options, parent_queue_item=None)[source]¶Patch the given request with the given options (e.g. user agent).
Parameters: |
|
---|
nyawc.helpers.PackageHelper.
PackageHelper
[source]¶Bases: object
The Package class contains all the package related information (like the version number).
get_alias
()[source]¶Get the alias of this package.
Returns: | The alias of this package. |
---|---|
Return type: | str |
get_description
()[source]¶Get the description of this package.
Returns: | The description of this package. |
---|---|
Return type: | str |
get_name
()[source]¶Get the name of this package.
Returns: | The name of this package. |
---|---|
Return type: | str |
get_version
()[source]¶Get the version number of this package.
Returns: | The version number (marjor.minor.patch). |
---|---|
Return type: | str |
Note
When this package is installed, the version number will be available through the
package resource details. Otherwise this method will look for a .semver
file.
Note
In rare cases corrupt installs can cause the version number to be unknown. In this case the version number will be set to the string “Unknown”.
nyawc.helpers.RandomInputHelper.
RandomInputHelper
[source]¶Bases: object
A helper for generating random user input.
Note
We need to cache the generated values to prevent infinite crawling loops. For example, if two responses contain the same ?search= form, the random generated value must be the same both of the times because otherwise the crawling would treat the new requests as two different requests.
cache
= {}[source]get_for_type
(input_type='text')[source]¶Get a random string for the given html input type
Parameters: | input_type (str) – The input type (e.g. email). |
---|---|
Returns: | The (cached) random value. |
Return type: | str |
get_random_color
()[source]¶Get a random color in HEX format (including hash character).
Returns: | The random HEX color. |
---|---|
Return type: | str |
get_random_email
(ltd='com')[source]¶Get a random email address with the given ltd.
Parameters: | ltd (str) – The ltd to use (e.g. com). |
---|---|
Returns: | The random email. |
Return type: | str |
get_random_number
(length=4)[source]¶Get a random number with the given length.
Parameters: | length (int) – The length of the number to return. |
---|---|
Returns: | The random number. |
Return type: | str |
get_random_password
()[source]¶Get a random password that complies with most of the requirements.
Note
This random password is not strong and not “really” random, and should only be used for testing purposes.
Returns: | The random password. |
---|---|
Return type: | str |
get_random_telephonenumber
()[source]¶Get a random 10 digit phone number that complies with most of the requirements.
Returns: | The random telephone number. |
---|---|
Return type: | str |
get_random_text
()[source]¶Get a random string with the given length.
Parameters: | length (int) – The length of the string to return. |
---|---|
Returns: | The random string. |
Return type: | str |
get_random_url
(ltd='com')[source]¶Get a random url with the given ltd.
Parameters: | ltd (str) – The ltd to use (e.g. com). |
---|---|
Returns: | The random url. |
Return type: | str |
get_random_value
(length=10, character_sets=['ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'])[source]¶Get a random string with the given length.
Parameters: |
|
---|---|
Returns: | The random string. |
Return type: | str |
nyawc.helpers.URLHelper.
URLHelper
[source]¶Bases: object
A helper for URL strings.
append_with_data
(url, data)[source]¶Append the given URL with the given data OrderedDict.
Parameters: |
|
---|---|
Returns: | The new URL. |
Return type: | str |
get_hostname
(url)[source]¶Get the hostname of the given URL.
Parameters: | url (str) – The URL to get the hostname from. |
---|---|
Returns: | The hostname |
Return type: | str |
get_ordered_params
(url)[source]¶Get the query parameters of the given URL in alphabetical order.
Parameters: | url (str) – The URL to get the query parameters from. |
---|---|
Returns: | The query parameters |
Return type: | str |
get_path
(url)[source]¶Get the path (e.g /page/23) of the given URL.
Parameters: | url (str) – The URL to get the path from. |
---|---|
Returns: | The path |
Return type: | str |
get_protocol
(url)[source]¶Get the protocol (e.g. http, https or ftp) of the given URL.
Parameters: | url (str) – The URL to get the protocol from. |
---|---|
Returns: | The URL protocol |
Return type: | str |
get_subdomain
(url)[source]¶Get the subdomain of the given URL.
Parameters: | url (str) – The URL to get the subdomain from. |
---|---|
Returns: | The subdomain(s) |
Return type: | str |
get_tld
(url)[source]¶Get the tld of the given URL.
Parameters: | url (str) – The URL to get the tld from. |
---|---|
Returns: | The tld |
Return type: | str |
is_mailto
(url)[source]¶Check if the given URL is a mailto URL
Parameters: | url (str) – The URL to check. |
---|---|
Returns: | True if mailto, False otherwise. |
Return type: | bool |
is_parsable
(url)[source]¶Check if the given URL is parsable (make sure it’s a valid URL). If it is parsable, also cache it.
Parameters: | url (str) – The URL to check. |
---|---|
Returns: | True if parsable, False otherwise. |
Return type: | bool |
make_absolute
(base, relative)[source]¶Make the given (relative) URL absolute.
Parameters: |
|
---|---|
Returns: | The absolute URL. |
Return type: | str |
query_dict_to_string
(query)[source]¶Convert an OrderedDict to a query string.
Parameters: | query (obj) – The key value object with query params. |
---|---|
Returns: | The query string. |
Return type: | str |
Note
This method does the same as urllib.parse.urlencode except that it doesn’t actually encode the values.