fcs.server.crawling_depth_policy

In this module crawling depth computing policies are contained.

class BaseCrawlingDepthPolicy

This is a base class for crawling depth policy implementations.

static calculate_depth()

Returns crawling depth.

Returns:Crawling depth.
Return type:int
class IgnoreDepthPolicy

Implementation that ignores depth. Calculated depth is always 0.

static calculate_depth()

Always returns 0.

Returns:Crawling depth (0).
Return type:int
class SimpleCrawlingDepthPolicy

Depth is computed in accordance with the following rules:

* - new domain

  1. A.com -> *B.com => depth_2 = 0
  2. A.com -> A.com/aaa/ => depth_2 = depth_1 + 1
  3. A.com -> *B.com -> A.com/aaa/ => depth_1 = x, depth_2 = 0, depth_3 = 0
static calculate_depth(link=None, source_url=None, depth=None)
Parameters:
  • link (string) – Address of site for which crawling depth is computed.
  • source_url (string) – Address of site from which link has been retrieved.
  • depth (int) – source_url page depth.
Returns:

Crawling depth.

Return type:

int

Raises ValueError:
 

if some URL is invalid.

class RealDepthCrawlingDepthPolicy

Depth is computed in accordance with the following rules:

* - new domain

  1. A.com -> *B.com => depth_2 = 0
  2. A.com -> A.com/aaa/ => depth_2 = depth_1 + 1
  3. A.com -> *B.com -> A.com/aaa/ => depth_1 = x, depth_2 = 0, depth_3 = x + 1
static calculate_depth(link=None, link_db=None)
Parameters:
  • link (string) – Address of site for which crawling depth is computed
  • link_db (string) – Database Access Object, extending BaseLinkDB.
Returns:

Crawling depth.

Return type:

int

Raises ValueError:
 

if some URL is invalid.