fcs.server.content_db

This module contains API for connection with database for crawled content.

class BerkeleyContentDB(base_name)

API for Berkeley DB (http://www.oracle.com/database/berkeley-db/). It uses an interface to the Berkeley DB library provided by the bsddb module.

Parameters:base_name (string) – Name of the database
content_db

Object to access Berkeley DB.

id_iter

Number of database records.

get_data_iter

Number of records retrieved from database.

parts_iter

Number of content data packages (files with crawled data) requested by user.

add_content(url, links, content)

Adds crawled content do database.

Parameters:
  • url (string) – Crawled page URL.
  • links (string) – Links visited during crawling process.
  • content (string) – Crawled content to put into database.
get_file_with_data_package(size)

Returns path to file with crawled data of given size.

Parameters:size (int) – Size of demanded data in MB.
Returns:Path to file with crawled data.
Return type:string
size()

Returns the number of elements (i.e. crawled content) in the database (taking into consideration the fact that after getting a record via web application or API, it is no longer available).

Returns:Number of elements in database.
Return type:int
added_records_num()

Returns number of entries containing information about sites that have been crawled since the beginning of crawling (takes also into account already unavailable data).

Returns:Number of added entries informing about crawled sites.
Return type:int
clear()

Clears content of database and closes it.

show()

Prints entries in database.