nyawc.http package

nyawc.http package

Submodules

nyawc.http.Handler module

class nyawc.http.Handler.Handler(options, queue_item)[source]

Bases: object

The Handler class executes HTTP requests.

__options[source]

The settins/options object.

Type:obj
__queue_item[source]

The queue item containing a request to execute.

Type:obj
_Handler__content_type_matches(content_type, available_content_types)[source]

Check if the given content type matches one of the available content types.

Parameters:
  • content_type (str) – The given content type.
  • list (available_content_types) – All the available content types.
Returns:

True if a match was found, False otherwise.

Return type:

bool

_Handler__get_all_scrapers()[source]

Find all available scraper references.

Returns:The scraper references.
Return type:list(obj)
_Handler__get_all_scrapers_modules()[source]

Find all available scraper modules.

Returns:The scraper modules.
Return type:list(obj)
_Handler__make_request(url, method, data, auth, cookies, headers, proxies, timeout, verify)[source]

Execute a request with the given data.

Parameters:
  • url (str) – The URL to call.
  • method (str) – The method (e.g. get or post).
  • data (str) – The data to call the URL with.
  • auth (obj) – The authentication class.
  • cookies (obj) – The cookie dict.
  • headers (obj) – The header dict.
  • proxies (obj) – The proxies dict.
  • timeout (int) – The request timeout in seconds.
  • verify (mixed) – SSL verification.
Returns:

The response object.

Return type:

obj

__init__(options, queue_item)[source]

Construct the HTTP handler.

Parameters:
get_new_requests()[source]

Retrieve all the new request that were found in this request.

Returns:A list of request objects.
Return type:list(nyawc.http.Request)

nyawc.http.Request module

class nyawc.http.Request.Request(url, method='get', data=None, auth=None, cookies=None, headers=None, proxies=None, timeout=30, verify=True)[source]

Bases: object

The Request class contains details that were used to request the specified URL.

METHOD_OPTIONS[source]

A request method that can be used to request the URL.

Type:str
METHOD_GET[source]

A request method that can be used to request the URL.

Type:str
METHOD_HEAD[source]

A request method that can be used to request the URL.

Type:str
METHOD_POST[source]

A request method that can be used to request the URL.

Type:str
METHOD_PUT[source]

A request method that can be used to request the URL.

Type:str
METHOD_DELETE[source]

A request method that can be used to request the URL.

Type:str
parent_raised_error[source]

If the parent request raised an error (e.g. 404).

Type:bool
depth[source]

The current crawling depth.

Type:int
url[source]

The absolute URL to use when making the request.

Type:str
method[source]

The request method to use for the request.

Type:str
data[source]

The post data {key: value} OrderedDict that will be sent.

Type:obj
auth[source]

The (requests module) authentication class to use for the request.

Type:obj
cookies[source]

The (requests module) cookie jar to use for the request.

Type:obj
headers[source]

The headers {key: value} to use for the request.

Type:obj
proxies[source]

The proxies {key: value} to use for the request.

Type:obj
timeout[source]

The amount of seconds to wait before a timeout exception will be thrown.

Type:int
verify[source]

True or False based on if certificates should be checked or else a path to a trusted bundle.

Type:mixed
METHOD_DELETE = 'delete'[source]
METHOD_GET = 'get'[source]
METHOD_HEAD = 'head'[source]
METHOD_OPTIONS = 'options'[source]
METHOD_POST = 'post'[source]
METHOD_PUT = 'put'[source]
__init__(url, method='get', data=None, auth=None, cookies=None, headers=None, proxies=None, timeout=30, verify=True)[source]

Constructs a Request instance.

Parameters:
  • url (str) – The absolute URL to use when making the request.
  • method (str) – The request method to use for the request.
  • data (obj) – The post data {key: value} OrderedDict that will be sent.
  • auth (obj) – The (requests module) authentication class to use for the request.
  • cookies (obj) – The (requests module) cookie jar to use for the request.
  • headers (obj) – The headers {key: value} to use for the request.
  • proxies (obj) – The proxies {key: value} to use for the request.
  • timeout (int) – The amount of seconds to wait before a timeout exception will be thrown.
  • verify (mixed) – True or False based on if certificates should be checked or else a path to a trusted bundle.

nyawc.http.Response module

class nyawc.http.Response.Response(url)[source]

Bases: object

Response placeholder class for before request is finished.

url[source]

The absolute URL of the request/response.

Type:str

Note

This class will be replaced with the response class of Python’s requests module when the request is finished. For more information check http://docs.python-requests.org/en/master/api/#requests.Response.

__init__(url)[source]

Constructs a Response instance.

Parameters:url (str) – The absolute URL of the request/response.