fcs.server.url_processor

This module contains class for processing and unifying URLs.

class URLProcessor

Processes and corrects URLs retrieved from a web site and delivers other methods operating on web addresses (these methods are used e.g. by crawl depth policy classes).

static validate(link, domain=None)

Validates and unifies link.

Parameters:
  • link (string) – Link to validate.
  • domain (string) – Base URL that should be combined with the link if the link does not start with ‘http://‘ or ‘https://‘.
Returns:

Validated link.

Return type:

string

static identical_hosts(link_a, link_b)

Compares link_a’s and link_b’s hosts.

Parameters:
  • link_a (string) – First link.
  • link_b (string) – Second link.
Returns:

Information if links’ hosts are identical.

Return type:

bool

static generate_url_hierarchy(link)

Returns list of all URLs which are component parts of the given link. Such URLs may be generated by trimming the link. For example, if value of link is http://www.allegro.pl/country_pages/1/0/z9.php, the method will return the following list: [‘http://allegro.pl‘, ‘http://allegro.pl/country_pages‘, ‘http://allegro.pl/country_pages/1‘, ‘http://allegro.pl/country_pages/1/0‘].

Parameters:link (string) – Link from which a resultant list will be generated.
Returns:All URLs generated by trimming the link.
Return type:list