nyawc.helpers package¶
Submodules¶
nyawc.helpers.HTTPRequestHelper module¶
-
class
nyawc.helpers.HTTPRequestHelper.
HTTPRequestHelper
[source]¶ Bases:
object
A helper for the src.http.Request module.
-
static
complies_with_scope
(queue_item, new_request, scope)[source]¶ Check if the new request complies with the crawling scope.
Parameters: - queue_item (
nyawc.QueueItem
) – The parent queue item of the new request. - new_request (
nyawc.http.Request
) – The request to check. - scope (
nyawc.Options.OptionsScope
) – The scope to check.
Returns: True if it complies, False otherwise.
Return type: bool
- queue_item (
-
static
patch_with_options
(request, options, parent_queue_item=None)[source]¶ Patch the given request with the given options (e.g. user agent).
Parameters: - request (
nyawc.http.Request
) – The request to patch. - options (
nyawc.Options
) – The options to patch the request with. - parent_queue_item (
nyawc.QueueItem
) – The parent queue item object (request/response pair) if exists.
- request (
-
static
nyawc.helpers.RandomInputHelper module¶
-
class
nyawc.helpers.RandomInputHelper.
RandomInputHelper
[source]¶ Bases:
object
A helper for generating random user input.
Note
We need to cache the generated values to prevent infinite crawling loops. For example, if two responses contain the same ?search= form, the random generated value must be the same both of the times because otherwise the crawling would treat the new requests as two different requests.
-
cache
= {}[source]
-
static
get_for_type
(input_type='text')[source]¶ Get a random string for the given html input type
Parameters: input_type (str) – The input type (e.g. email). Returns: The (cached) random value. Return type: str
-
static
get_random_color
()[source]¶ Get a random color in HEX format (including hash character).
Returns: The random HEX color. Return type: str
-
static
get_random_email
(ltd='com')[source]¶ Get a random email address with the given ltd.
Parameters: ltd (str) – The ltd to use (e.g. com). Returns: The random email. Return type: str
-
static
get_random_number
(length=4)[source]¶ Get a random number with the given length.
Parameters: length (int) – The length of the number to return. Returns: The random number. Return type: str
-
static
get_random_password
()[source]¶ Get a random password that complies with most of the requirements.
Note
This random password is not strong and not “really” random, and should only be used for testing purposes.
Returns: The random password. Return type: str
-
static
get_random_telephonenumber
()[source]¶ Get a random 10 digit phone number that complies with most of the requirements.
Returns: The random telephone number. Return type: str
-
static
get_random_text
()[source]¶ Get a random string with the given length.
Parameters: length (int) – The length of the string to return. Returns: The random string. Return type: str
-
static
get_random_url
(ltd='com')[source]¶ Get a random url with the given ltd.
Parameters: ltd (str) – The ltd to use (e.g. com). Returns: The random url. Return type: str
-
static
get_random_value
(length=10, character_sets=['ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'])[source]¶ Get a random string with the given length.
Parameters: - length (int) – The length of the string to return.
- list (character_sets) – The caracter sets to use.
Returns: The random string.
Return type: str
-
nyawc.helpers.URLHelper module¶
-
class
nyawc.helpers.URLHelper.
URLHelper
[source]¶ Bases:
object
A helper for URL strings.
-
static
append_with_data
(url, data)[source]¶ Append the given URL with the given data OrderedDict.
Parameters: - url (str) – The URL to append.
- data (obj) – The key value OrderedDict to append to the URL.
Returns: The new URL.
Return type: str
-
cache
= {}[source]
-
static
get_hostname
(url)[source]¶ Get the hostname of the given URL.
Parameters: url (str) – The URL to get the hostname from. Returns: The hostname Return type: str
-
static
get_ordered_params
(url)[source]¶ Get the query parameters of the given URL.
Parameters: url (str) – The URL to get the query parameters from. Returns: The query parameters Return type: str
-
static
get_path
(url)[source]¶ Get the path (e.g /page/23) of the given URL.
Parameters: url (str) – The URL to get the path from. Returns: The path Return type: str
-
static
get_protocol
(url)[source]¶ Get the protocol (e.g. http, https or ftp) of the given URL.
Parameters: url (str) – The URL to get the protocol from. Returns: The URL protocol Return type: str
-
static
get_subdomain
(url)[source]¶ Get the subdomain of the given URL.
Parameters: url (str) – The URL to get the subdomain from. Returns: The subdomain(s) Return type: str
-
static
get_tld
(url)[source]¶ Get the tld of the given URL.
Parameters: url (str) – The URL to get the tld from. Returns: The tld Return type: str
-
static
is_mailto
(url)[source]¶ Check if the given URL is a mailto URL
Parameters: url (str) – The URL to check. Returns: True if mailto, False otherwise. Return type: bool
-
static