Migration

Table of Contents

From 1.5 to 1.6

Headers have default values and are case insensitive

From now on the headers identity option has default values and is a case insensitive dict. When changing headers the .update() method should be used so the default headers remain.

# Old
options.identity.headers = {
    "User-Agent": "MyCustomUserAgent"
}

# New
options.identity.headers.update({
    "User-Agent": "MyCustomUserAgent"
})

New default user agent

The default user agent for the crawler has changed. In version 1.5 it was a fake Chrome user agent and from now on it is nyawc/1.6.0 CPython/3.6.1 Windows/10 based on the versions you use.

The Chrome user agent from version 1.5 can still be faked by using the code below.

options.identity.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"
})

From 1.4 to 1.5

Renamed the domain must match scope option

Since version 1.5 the domain_must_match option is now called hostname_must_match.

# Old
Options().scope.domain_must_match = True/False

# New
Options().scope.hostname_must_match = True/False