fcs.manager.management.commands.autoscaling¶
The autoscaling module. It is run as a Django application command.
- CURRENT_PATH¶
Location of autoscaling module.
- PATH_TO_SERVER¶
Path to Task Server web interface.
- PATH_TO_CRAWLER¶
Path to crawler web interface.
- SERVER_SPAWN_TIMEOUT¶
A period of time after which Task Server is considered to be unable to spawn.
- MAX_CRAWLERS_NUM¶
Maximal amounts of crawlers.
- DEFAULT_LINK_QUEUE_SIZE¶
Default size of package with links to be parsed by crawler.
- MIN_LINK_PACKAGE_SIZE¶
Minimal size of package with links that have to be processed by crawler.
- STATS_PERIOD¶
Frequency of statistics computing.
- MIN_CRAWLER_STATS_PERIOD¶
A period during which crawler efficiency is not evaluated.
- MIN_SERVER_STATS_PERIOD¶
A period during which Task Server efficiency is not evaluated.
- AUTOSCALING_PERIOD¶
Frequency of downloading the Task Servers efficiency statistics.
- LOOP_PERIOD¶
A period of idleness between work cycles.
- EFFICIENCY_THRESHOLD¶
Border actual-to-expected efficiency ratio. If its value is lower than actual-to-expected efficiency ratio, no more crawlers will be spawned.
- LOWER_LOAD_THRESHOLD¶
If crawlers’ actual-to-expected load is higher then this value, new crawler is spawned.
- UPPER_LOAD_THRESHOLD¶
If crawlers’ actual-to-expected load is lower then this value, one crawler is stopped.
- INIT_SERVER_PORT¶
Port number of first Task Server. Each next has one higher.
- INIT_CRAWLER_PORT¶
Port number of first Crawling Unit. Each next has one higher.
- sigint_signal_handler(num, stack)¶
SIGINT signal handler. Kills all Crawling Units and Task Servers.
Parameters: - num (int) – signal number
- stack (frame) – current stack frame (for details on frame type, see Python documentation)
- class Command¶
Definition of the command ‘autoscaling’.
- address¶
Address of this autoscaling module.
- server_port¶
The lowest free number of port for new Task Server.
- crawler_port¶
The lowest free number of port for new crawler.
- last_scaling¶
Time of last scaling.
- old_crawlers¶
Parameter used for check if some crawlers should not be assigned again.
- changed¶
Parameter used for check if some crawlers should not be assigned again.
- handle(*args, **options)¶
Main command method, called when command is run.
- print_tasks()¶
Prints tasks’ details on standard output (usually console window).
- check_tasks_state()¶
Checks if new Task Server should not be run for any of the tasks (e.g. because some task is new or a previous Task Server did not start).
- check_server_assignment(task)¶
Checks if new Task Server should not be run for the given task and runs Task Server if needed (e.g. because this task is new or a previous Task Server did not start).
Parameters: task (Task) – task which could need to have new Task Server assigned
- handle_priority_changes()¶
If some crawling-speed affecting task parameters change, speed of every crawler is updated.
- spawn_task_server(task)¶
Spawns Task Server for the given task. This method is called in two cases: the task is new or previously assigned Task Server did not confirm its proper launch.
Parameters: task (Task) – task for which new Task Server is spawned
- spawn_crawler()¶
Spawns new crawler.
- assign_crawlers_to_servers()¶
Sets group of crawlers for every task.
- autoscale()¶
Kills not responding servers and crawlers, calculates efficiency, stops or spawns new crawlers if necessary.