fcs.crawler.web_interface

In this module web methods for managing Crawling Units are defined. These methods are implemented as classes that contain proper POST or GET methods. Request are encapsulated in JSON messages.

class index

Shows information about state of the Crawling Unit and its efficiency.

Returns:Diagnostic information (crawler’s state and efficiency).
Return type:string

Passes links to crawl to Crawling Unit. Required POST parameters are:

  • id - ID of the package with links
  • links - links to crawl
  • server_address - address of Task Server that sent this package of links
  • mime_type - list of MIME types of data to be crawled
Returns:Confirmation of sending links to Crawling Unit.
Return type:string
Raises KeyError:
 if request body is incorrect
class stop

Stops a Crawling Unit.

Returns:Confirmation of stopping the Crawling Unit.
Return type:string
class kill

Kills a Crawling Unit.

Returns:Confirmation of killing the Crawling Unit.
Return type:string
class alive

Pings if Crawling Unit is alive.

Returns:Information that Crawling Unit is alive.
Return type:string
class stats

Asks Crawling Unit for statistics from a given time period. Reqired POST parameter is:

  • seconds - time period for which statistics should be returned (counting since now)
Returns:Crawling statistics from a given time period.
Return type:JSON
class Server(port=8080, address='0.0.0.0')

Wrapper for Crawler Unit’s REST API.

Parameters:
  • port (int) – Server’s port.
  • address (string) – Server’s address.
urls

Mapping between URLs and web methods.

app

Server is run as a web application. This attribute is an object representing that web application.

run()

Runs this server.

kill()

Kills this server.