fcs.server.content_db¶

This module contains API for connection with database for crawled content.

class BerkeleyContentDB(base_name)¶

API for Berkeley DB (http://www.oracle.com/database/berkeley-db/). It uses an interface to the Berkeley DB library provided by the bsddb module.

Parameters:	base_name (string) – Name of the database

content_db¶: Object to access Berkeley DB.

id_iter¶: Number of database records.

get_data_iter¶: Number of records retrieved from database.

parts_iter¶: Number of content data packages (files with crawled data) requested by user.

add_content(url, links, content)¶

Adds crawled content do database.

Parameters:	url (string) – Crawled page URL. links (string) – Links visited during crawling process. content (string) – Crawled content to put into database.

get_file_with_data_package(size)¶

Returns path to file with crawled data of given size.

Parameters:	size (int) – Size of demanded data in MB.
Returns:	Path to file with crawled data.
Return type:	string

size()¶

Returns the number of elements (i.e. crawled content) in the database (taking into consideration the fact that after getting a record via web application or API, it is no longer available).

Returns:	Number of elements in database.
Return type:	int

added_records_num()¶

Returns number of entries containing information about sites that have been crawled since the beginning of crawling (takes also into account already unavailable data).

Returns:	Number of added entries informing about crawled sites.
Return type:	int

clear()¶: Clears content of database and closes it.

show()¶: Prints entries in database.