fcs.manager.management.commands.autoscaling

The autoscaling module. It is run as a Django application command.

CURRENT_PATH

Location of autoscaling module.

PATH_TO_SERVER

Path to Task Server web interface.

PATH_TO_CRAWLER

Path to crawler web interface.

SERVER_SPAWN_TIMEOUT

A period of time after which Task Server is considered to be unable to spawn.

MAX_CRAWLERS_NUM

Maximal amounts of crawlers.

Default size of package with links to be parsed by crawler.

Minimal size of package with links that have to be processed by crawler.

STATS_PERIOD

Frequency of statistics computing.

MIN_CRAWLER_STATS_PERIOD

A period during which crawler efficiency is not evaluated.

MIN_SERVER_STATS_PERIOD

A period during which Task Server efficiency is not evaluated.

AUTOSCALING_PERIOD

Frequency of downloading the Task Servers efficiency statistics.

LOOP_PERIOD

A period of idleness between work cycles.

EFFICIENCY_THRESHOLD

Border actual-to-expected efficiency ratio. If its value is lower than actual-to-expected efficiency ratio, no more crawlers will be spawned.

LOWER_LOAD_THRESHOLD

If crawlers’ actual-to-expected load is higher then this value, new crawler is spawned.

UPPER_LOAD_THRESHOLD

If crawlers’ actual-to-expected load is lower then this value, one crawler is stopped.

INIT_SERVER_PORT

Port number of first Task Server. Each next has one higher.

INIT_CRAWLER_PORT

Port number of first Crawling Unit. Each next has one higher.

sigint_signal_handler(num, stack)

SIGINT signal handler. Kills all Crawling Units and Task Servers.

Parameters:
  • num (int) – signal number
  • stack (frame) – current stack frame (for details on frame type, see Python documentation)
class Command

Definition of the command ‘autoscaling’.

address

Address of this autoscaling module.

server_port

The lowest free number of port for new Task Server.

crawler_port

The lowest free number of port for new crawler.

last_scaling

Time of last scaling.

old_crawlers

Parameter used for check if some crawlers should not be assigned again.

changed

Parameter used for check if some crawlers should not be assigned again.

handle(*args, **options)

Main command method, called when command is run.

print_tasks()

Prints tasks’ details on standard output (usually console window).

check_tasks_state()

Checks if new Task Server should not be run for any of the tasks (e.g. because some task is new or a previous Task Server did not start).

check_server_assignment(task)

Checks if new Task Server should not be run for the given task and runs Task Server if needed (e.g. because this task is new or a previous Task Server did not start).

Parameters:task (Task) – task which could need to have new Task Server assigned
handle_priority_changes()

If some crawling-speed affecting task parameters change, speed of every crawler is updated.

spawn_task_server(task)

Spawns Task Server for the given task. This method is called in two cases: the task is new or previously assigned Task Server did not confirm its proper launch.

Parameters:task (Task) – task for which new Task Server is spawned
spawn_crawler()

Spawns new crawler.

assign_crawlers_to_servers()

Sets group of crawlers for every task.

autoscale()

Kills not responding servers and crawlers, calculates efficiency, stops or spawns new crawlers if necessary.