nyawc.helpers.DebugHelper.DebugHelper[source]¶A helper for printing debug messages.
output(options, message)[source]¶Print the given message if the debug option in the given options is on.
| Parameters: |
|
|---|
setup(options)[source]¶Initialize debug/logging in third party libraries correctly.
| Parameters: | options (nyawc.Options) – The options to use for the current crawling runtime. |
|---|
nyawc.helpers.HTTPRequestHelper.HTTPRequestHelper[source]¶A helper for the src.http.Request module.
complies_with_scope(queue_item, new_request, scope)[source]¶Check if the new request complies with the crawling scope.
| Parameters: |
|
|---|---|
| Returns: | True if it complies, False otherwise. |
| Return type: | bool |
Convert a requests cookie jar to a HTTP request cookie header value.
| Parameters: | queue_item (nyawc.QueueItem) – The parent queue item of the new request. |
|---|---|
| Returns: | The HTTP cookie header value. |
| Return type: | str |
patch_with_options(request, options, parent_queue_item=None)[source]¶Patch the given request with the given options (e.g. user agent).
| Parameters: |
|
|---|
nyawc.helpers.PackageHelper.PackageHelper[source]¶The Package class contains all the package related information (like the version number).
get_alias()[source]¶Get the alias of this package.
| Returns: | The alias of this package. |
|---|---|
| Return type: | str |
get_description()[source]¶Get the description of this package.
| Returns: | The description of this package. |
|---|---|
| Return type: | str |
get_name()[source]¶Get the name of this package.
| Returns: | The name of this package. |
|---|---|
| Return type: | str |
get_version()[source]¶Get the version number of this package.
| Returns: | The version number (marjor.minor.patch). |
|---|---|
| Return type: | str |
Note
When this package is installed, the version number will be available through the
package resource details. Otherwise this method will look for a .semver file.
Note
In rare cases corrupt installs can cause the version number to be unknown. In this case the version number will be set to the string “Unknown”.
nyawc.helpers.RandomInputHelper.RandomInputHelper[source]¶A helper for generating random user input.
Note
We need to cache the generated values to prevent infinite crawling loops. For example, if two responses contain the same ?search= form, the random generated value must be the same both of the times because otherwise the crawling would treat the new requests as two different requests.
cache = {}[source]get_for_type(input_type='text')[source]¶Get a random string for the given html input type
| Parameters: | input_type (str) – The input type (e.g. email). |
|---|---|
| Returns: | The (cached) random value. |
| Return type: | str |
get_random_color()[source]¶Get a random color in HEX format (including hash character).
| Returns: | The random HEX color. |
|---|---|
| Return type: | str |
get_random_email(ltd='com')[source]¶Get a random email address with the given ltd.
| Parameters: | ltd (str) – The ltd to use (e.g. com). |
|---|---|
| Returns: | The random email. |
| Return type: | str |
get_random_number(length=4)[source]¶Get a random number with the given length.
| Parameters: | length (int) – The length of the number to return. |
|---|---|
| Returns: | The random number. |
| Return type: | str |
get_random_password()[source]¶Get a random password that complies with most of the requirements.
Note
This random password is not strong and not “really” random, and should only be used for testing purposes.
| Returns: | The random password. |
|---|---|
| Return type: | str |
get_random_telephonenumber()[source]¶Get a random 10 digit phone number that complies with most of the requirements.
| Returns: | The random telephone number. |
|---|---|
| Return type: | str |
get_random_text()[source]¶Get a random string with the given length.
| Parameters: | length (int) – The length of the string to return. |
|---|---|
| Returns: | The random string. |
| Return type: | str |
get_random_url(ltd='com')[source]¶Get a random url with the given ltd.
| Parameters: | ltd (str) – The ltd to use (e.g. com). |
|---|---|
| Returns: | The random url. |
| Return type: | str |
get_random_value(length=10, character_sets=['ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'])[source]¶Get a random string with the given length.
| Parameters: |
|
|---|---|
| Returns: | The random string. |
| Return type: | str |
nyawc.helpers.URLHelper.URLHelper[source]¶A helper for URL strings.
append_with_data(url, data)[source]¶Append the given URL with the given data OrderedDict.
| Parameters: |
|
|---|---|
| Returns: | The new URL. |
| Return type: | str |
get_hostname(url)[source]¶Get the hostname of the given URL.
| Parameters: | url (str) – The URL to get the hostname from. |
|---|---|
| Returns: | The hostname |
| Return type: | str |
get_ordered_params(url)[source]¶Get the query parameters of the given URL in alphabetical order.
| Parameters: | url (str) – The URL to get the query parameters from. |
|---|---|
| Returns: | The query parameters |
| Return type: | str |
get_path(url)[source]¶Get the path (e.g /page/23) of the given URL.
| Parameters: | url (str) – The URL to get the path from. |
|---|---|
| Returns: | The path |
| Return type: | str |
get_protocol(url)[source]¶Get the protocol (e.g. http, https or ftp) of the given URL.
| Parameters: | url (str) – The URL to get the protocol from. |
|---|---|
| Returns: | The URL protocol |
| Return type: | str |
get_subdomain(url)[source]¶Get the subdomain of the given URL.
| Parameters: | url (str) – The URL to get the subdomain from. |
|---|---|
| Returns: | The subdomain(s) |
| Return type: | str |
get_tld(url)[source]¶Get the tld of the given URL.
| Parameters: | url (str) – The URL to get the tld from. |
|---|---|
| Returns: | The tld |
| Return type: | str |
is_mailto(url)[source]¶Check if the given URL is a mailto URL
| Parameters: | url (str) – The URL to check. |
|---|---|
| Returns: | True if mailto, False otherwise. |
| Return type: | bool |
is_parsable(url)[source]¶Check if the given URL is parsable (make sure it’s a valid URL). If it is parsable, also cache it.
| Parameters: | url (str) – The URL to check. |
|---|---|
| Returns: | True if parsable, False otherwise. |
| Return type: | bool |
make_absolute(base, relative)[source]¶Make the given (relative) URL absolute.
| Parameters: |
|
|---|---|
| Returns: | The absolute URL. |
| Return type: | str |
query_dict_to_string(query)[source]¶Convert an OrderedDict to a query string.
| Parameters: | query (obj) – The key value object with query params. |
|---|---|
| Returns: | The query string. |
| Return type: | str |
Note
This method does the same as urllib.parse.urlencode except that it doesn’t actually encode the values.