nyawc.helpers package

nyawc.helpers package

Submodules

nyawc.helpers.DebugHelper module

class nyawc.helpers.DebugHelper.DebugHelper[source]

A helper for printing debug messages.

static output(options, message)[source]

Print the given message if the debug option in the given options is on.

Parameters:
  • options (nyawc.Options) – The options to use for the current crawling runtime.
  • message (str) – The message to print.
static setup(options)[source]

Initialize debug/logging in third party libraries correctly.

Parameters:options (nyawc.Options) – The options to use for the current crawling runtime.

nyawc.helpers.HTTPRequestHelper module

class nyawc.helpers.HTTPRequestHelper.HTTPRequestHelper[source]

A helper for the src.http.Request module.

static complies_with_scope(queue_item, new_request, scope)[source]

Check if the new request complies with the crawling scope.

Parameters:
Returns:

True if it complies, False otherwise.

Return type:

bool

Convert a requests cookie jar to a HTTP request cookie header value.

Parameters:queue_item (nyawc.QueueItem) – The parent queue item of the new request.
Returns:The HTTP cookie header value.
Return type:str
static patch_with_options(request, options, parent_queue_item=None)[source]

Patch the given request with the given options (e.g. user agent).

Parameters:

nyawc.helpers.PackageHelper module

class nyawc.helpers.PackageHelper.PackageHelper[source]

The Package class contains all the package related information (like the version number).

__name[source]

Cached package name.

Type:str
__description[source]

Cached package description.

Type:str
__alias[source]

Cached package alias.

Type:str
__version[source]

Cached package version number (if initialized).

Type:str
static get_alias()[source]

Get the alias of this package.

Returns:The alias of this package.
Return type:str
static get_description()[source]

Get the description of this package.

Returns:The description of this package.
Return type:str
static get_name()[source]

Get the name of this package.

Returns:The name of this package.
Return type:str
static get_version()[source]

Get the version number of this package.

Returns:The version number (marjor.minor.patch).
Return type:str

Note

When this package is installed, the version number will be available through the package resource details. Otherwise this method will look for a .semver file.

Note

In rare cases corrupt installs can cause the version number to be unknown. In this case the version number will be set to the string “Unknown”.

static rst_to_pypi(contents)[source]

Convert the given GitHub RST contents to PyPi RST contents (since some RST directives are not available in PyPi).

Parameters:contents (str) – The GitHub compatible RST contents.
Returns:The PyPi compatible RST contents.
Return type:str

nyawc.helpers.RandomInputHelper module

class nyawc.helpers.RandomInputHelper.RandomInputHelper[source]

A helper for generating random user input.

Note

We need to cache the generated values to prevent infinite crawling loops. For example, if two responses contain the same ?search= form, the random generated value must be the same both of the times because otherwise the crawling would treat the new requests as two different requests.

cache[source]

Cached values of the generated data.

Type:obj
cache = {}[source]
static get_for_type(input_type='text')[source]

Get a random string for the given html input type

Parameters:input_type (str) – The input type (e.g. email).
Returns:The (cached) random value.
Return type:str
static get_random_color()[source]

Get a random color in HEX format (including hash character).

Returns:The random HEX color.
Return type:str
static get_random_email(ltd='com')[source]

Get a random email address with the given ltd.

Parameters:ltd (str) – The ltd to use (e.g. com).
Returns:The random email.
Return type:str
static get_random_number(length=4)[source]

Get a random number with the given length.

Parameters:length (int) – The length of the number to return.
Returns:The random number.
Return type:str
static get_random_password()[source]

Get a random password that complies with most of the requirements.

Note

This random password is not strong and not “really” random, and should only be used for testing purposes.

Returns:The random password.
Return type:str
static get_random_telephonenumber()[source]

Get a random 10 digit phone number that complies with most of the requirements.

Returns:The random telephone number.
Return type:str
static get_random_text()[source]

Get a random string with the given length.

Parameters:length (int) – The length of the string to return.
Returns:The random string.
Return type:str
static get_random_url(ltd='com')[source]

Get a random url with the given ltd.

Parameters:ltd (str) – The ltd to use (e.g. com).
Returns:The random url.
Return type:str
static get_random_value(length=10, character_sets=['ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'])[source]

Get a random string with the given length.

Parameters:
  • length (int) – The length of the string to return.
  • list (character_sets) – The caracter sets to use.
Returns:

The random string.

Return type:

str

nyawc.helpers.URLHelper module

class nyawc.helpers.URLHelper.URLHelper[source]

A helper for URL strings.

__cache[source]

Cached values of parsed URL data.

Type:obj
static append_with_data(url, data)[source]

Append the given URL with the given data OrderedDict.

Parameters:
  • url (str) – The URL to append.
  • data (obj) – The key value OrderedDict to append to the URL.
Returns:

The new URL.

Return type:

str

static get_hostname(url)[source]

Get the hostname of the given URL.

Parameters:url (str) – The URL to get the hostname from.
Returns:The hostname
Return type:str
static get_ordered_params(url)[source]

Get the query parameters of the given URL in alphabetical order.

Parameters:url (str) – The URL to get the query parameters from.
Returns:The query parameters
Return type:str
static get_path(url)[source]

Get the path (e.g /page/23) of the given URL.

Parameters:url (str) – The URL to get the path from.
Returns:The path
Return type:str
static get_protocol(url)[source]

Get the protocol (e.g. http, https or ftp) of the given URL.

Parameters:url (str) – The URL to get the protocol from.
Returns:The URL protocol
Return type:str
static get_subdomain(url)[source]

Get the subdomain of the given URL.

Parameters:url (str) – The URL to get the subdomain from.
Returns:The subdomain(s)
Return type:str
static get_tld(url)[source]

Get the tld of the given URL.

Parameters:url (str) – The URL to get the tld from.
Returns:The tld
Return type:str
static is_mailto(url)[source]

Check if the given URL is a mailto URL

Parameters:url (str) – The URL to check.
Returns:True if mailto, False otherwise.
Return type:bool
static is_parsable(url)[source]

Check if the given URL is parsable (make sure it’s a valid URL). If it is parsable, also cache it.

Parameters:url (str) – The URL to check.
Returns:True if parsable, False otherwise.
Return type:bool
static make_absolute(base, relative)[source]

Make the given (relative) URL absolute.

Parameters:
  • base (str) – The absolute URL the relative url was found on.
  • relative (str) – The (possibly relative) url to make absolute.
Returns:

The absolute URL.

Return type:

str

static query_dict_to_string(query)[source]

Convert an OrderedDict to a query string.

Parameters:query (obj) – The key value object with query params.
Returns:The query string.
Return type:str

Note

This method does the same as urllib.parse.urlencode except that it doesn’t actually encode the values.

static query_string_to_dict(query)[source]

Convert a string to a query dict.

Parameters:query (str) – The query string.
Returns:The key value object with query params.
Return type:obj

Note

This method does the same as urllib.parse.parse_qsl except that it doesn’t actually decode the values.

static remove_hash(url)[source]

Remove the #hash from the given URL.

Parameters:url (str) – The URL to remove the hash from.
Returns:The URL without the hash
Return type:str