nyawc.helpers package

Submodules

nyawc.helpers.HTTPRequestHelper module

class nyawc.helpers.HTTPRequestHelper.HTTPRequestHelper[source]

Bases: object

A helper for the src.http.Request module.

static complies_with_scope(queue_item, new_request, scope)[source]

Check if the new request complies with the crawling scope.

Parameters:
Returns:

True if it complies, False otherwise.

Return type:

bool

static patch_with_options(request, options, parent_queue_item=None)[source]

Patch the given request with the given options (e.g. user agent).

Parameters:

nyawc.helpers.RandomInputHelper module

class nyawc.helpers.RandomInputHelper.RandomInputHelper[source]

Bases: object

A helper for generating random user input.

Note

We need to cache the generated values to prevent infinite crawling loops. For example, if two responses contain the same ?search= form, the random generated value must be the same both of the times because otherwise the crawling would treat the new requests as two different requests.

cache[source]

obj – Cached values of the generated data.

cache = {}[source]
static get_for_type(input_type='text')[source]

Get a random string for the given html input type

Parameters:input_type (str) – The input type (e.g. email).
Returns:The (cached) random value.
Return type:str
static get_random_color()[source]

Get a random color in HEX format (including hash character).

Returns:The random HEX color.
Return type:str
static get_random_email(ltd='com')[source]

Get a random email address with the given ltd.

Parameters:ltd (str) – The ltd to use (e.g. com).
Returns:The random email.
Return type:str
static get_random_number(length=4)[source]

Get a random number with the given length.

Parameters:length (int) – The length of the number to return.
Returns:The random number.
Return type:str
static get_random_password()[source]

Get a random password that complies with most of the requirements.

Note

This random password is not strong and not “really” random, and should only be used for testing purposes.

Returns:The random password.
Return type:str
static get_random_telephonenumber()[source]

Get a random 10 digit phone number that complies with most of the requirements.

Returns:The random telephone number.
Return type:str
static get_random_text()[source]

Get a random string with the given length.

Parameters:length (int) – The length of the string to return.
Returns:The random string.
Return type:str
static get_random_url(ltd='com')[source]

Get a random url with the given ltd.

Parameters:ltd (str) – The ltd to use (e.g. com).
Returns:The random url.
Return type:str
static get_random_value(length=10, character_sets=['ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'])[source]

Get a random string with the given length.

Parameters:
  • length (int) – The length of the string to return.
  • list (character_sets) – The caracter sets to use.
Returns:

The random string.

Return type:

str

nyawc.helpers.URLHelper module

class nyawc.helpers.URLHelper.URLHelper[source]

Bases: object

A helper for URL strings.

cache[source]

obj – Cached values of parsed URL data.

static append_with_data(url, data)[source]

Append the given URL with the given data OrderedDict.

Parameters:
  • url (str) – The URL to append.
  • data (obj) – The key value OrderedDict to append to the URL.
Returns:

The new URL.

Return type:

str

cache = {}[source]
static get_hostname(url)[source]

Get the hostname of the given URL.

Parameters:url (str) – The URL to get the hostname from.
Returns:The hostname
Return type:str
static get_ordered_params(url)[source]

Get the query parameters of the given URL.

Parameters:url (str) – The URL to get the query parameters from.
Returns:The query parameters
Return type:str
static get_path(url)[source]

Get the path (e.g /page/23) of the given URL.

Parameters:url (str) – The URL to get the path from.
Returns:The path
Return type:str
static get_protocol(url)[source]

Get the protocol (e.g. http, https or ftp) of the given URL.

Parameters:url (str) – The URL to get the protocol from.
Returns:The URL protocol
Return type:str
static get_subdomain(url)[source]

Get the subdomain of the given URL.

Parameters:url (str) – The URL to get the subdomain from.
Returns:The subdomain(s)
Return type:str
static get_tld(url)[source]

Get the tld of the given URL.

Parameters:url (str) – The URL to get the tld from.
Returns:The tld
Return type:str
static is_mailto(url)[source]

Check if the given URL is a mailto URL

Parameters:url (str) – The URL to check.
Returns:True if mailto, False otherwise.
Return type:bool
static is_parsable(url)[source]

Check if the given URL is parsable (make sure it’s a valid URL). If it is parsable, also cache it.

Parameters:url (str) – The URL to check.
Returns:True if parsable, False otherwise.
Return type:bool
static make_absolute(base, relative)[source]

Make the given (relative) URL absolute.

Parameters:
  • base (str) – The absolute URL the relative url was found on.
  • relative (str) – The (possibly relative) url to make absolute.
Returns:

The absolute URL.

Return type:

str