Ap 2566 Smart Scheduling and Rollback Feature#163
Conversation
…that don't correspond to taps
…rt_blocks are handled
…able time blocks)
|
Comment
You can also request review from a specific team by commenting 💡 If you see something that doesn't look right, check the configuration guide. |
There was a problem hiding this comment.
Pull request overview
Changes: New feature (1), Database change (1), Documentation update (1), Maintenance (1)
This PR introduces a “SmartScheduling” capability to Cicada, using a Genetic Algorithm to shift cron start times to reduce load spikes, along with database-backed checkpointing/rollback support and local-dev seed data.
Changes:
- Add GA-based smart scheduling + rollback CLI commands and SmartScheduling module implementation.
- Extend DB schema with
schedule_backups+schedule_blacklist(and snapshot trigger) to support checkpointing/rollback. - Add docs (technical overview + diagrams) and local-dev SQL seeding; add new Python deps (
numpy,pygad).
Reviewed changes
Copilot reviewed 15 out of 18 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| setup/schema.sql | Adds snapshot trigger/function and new tables/indexes for backups + blacklist. |
| setup/create_test_tap_setup.sql | Provides local/dev seed data for servers/schedules + blacklist examples. |
| setup.py | Adds dependencies needed by the GA implementation. |
| local-dev/entrypoint.sh | Loads the new seed SQL into local dev DB on startup. |
| docs/offspring-ga.png | Diagram for GA offspring/crossover. |
| docs/genetic-algorithm-process-cycle.png | Diagram for GA process cycle. |
| docs/Smart Scheduler Technical Overview.md | Technical design/architecture documentation for SmartScheduling. |
| cicada/lib/scheduler.py | Adds DB helpers for backups/blacklist + rollback/restore operations. |
| cicada/lib/SmartScheduling/config.py | Defines GAConfig defaults. |
| cicada/lib/SmartScheduling/domain.py | Introduces Tap domain object + cron-derived attributes. |
| cicada/lib/SmartScheduling/evaluation.py | Adds fitness evaluation (CPU usage diff array + peak). |
| cicada/lib/SmartScheduling/pygad.py | Implements GA solve loop using PyGAD. |
| cicada/commands/upsert_schedule.py | Resets schedule backup baseline on upsert. |
| cicada/commands/smart_schedule.py | New command to optimize schedules and write results/checkpoints. |
| cicada/commands/rollback.py | New command to rollback schedules using schedule_backups. |
| cicada/commands/delete_schedule.py | Deletes a schedule’s backup record as well. |
| cicada/cli.py | Wires new commands into the CLI (partially). |
| CLAUDE.md | Adds repository/architecture guidance including SmartScheduling overview. |
Comments suppressed due to low confidence (2)
cicada/cli.py:44
test_functional_cli_entrypoint.pyasserts the top-levelcicada -houtput, including the command list. Adding new commands here will change that output and will cause those CLI tests to fail unless they’re updated to includesmart_schedule/rollback. Consider updating the test expectations (and adding coverage for the new subcommands’-houtput).
def __init__(self):
command_list = [
"register_server",
"list_server_schedules",
"exec_server_schedules",
"smart_schedule",
"show_schedule",
"upsert_schedule",
"exec_schedule",
"spread_schedules",
"archive_schedule_log",
"ping_slack",
"list_schedule_ids",
"delete_schedule",
"version",
]
cicada/cli.py:44
rollbackis implemented and imported, but it isn't listed incommand_list. As a resultcicada rollback ...will be rejected as an unrecognized command. Addrollbacktocommand_listso it is discoverable and dispatchable like the other subcommands.
command_list = [
"register_server",
"list_server_schedules",
"exec_server_schedules",
"smart_schedule",
"show_schedule",
"upsert_schedule",
"exec_schedule",
"spread_schedules",
"archive_schedule_log",
"ping_slack",
"list_schedule_ids",
"delete_schedule",
"version",
]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <copilot@github.com>
|
|
||
| **New Tables:** | ||
|
|
||
| - **`schedule_backups`** — Audit trail of schedule modifications |
There was a problem hiding this comment.
I see this in scheduler.py :
sqlquery = """
INSERT INTO schedule_backups (schedule_id, server_id, interval_mask, smart_interval_mask, snapshot_id)
SELECT schedule_id, server_id, interval_mask, smart_interval_mask, %s
FROM schedules WHERE schedule_id = ANY(%s)
"""
| sqlquery = f"UPDATE schedules SET {', '.join(case_clauses)} WHERE schedule_id = ANY(%s)" | ||
| db_cur.execute(sqlquery, tuple(params)) | ||
|
|
||
| return |
There was a problem hiding this comment.
what is the use of this?
| - **Mutation Type**: Random (randomly select genes and replace with random value from gene space) | ||
| - **Elitism**: Keep the best solution across generations (default: 1) | ||
|
|
||
| The creation of the offsprings uses different methods to change the solutions, however they must remain within the gene limits. For more information checkout the official [PyGAD documentation](https://pypi.org/project/pygad/5.3.0/) as it will be infinitely better than anything I can produce |
There was a problem hiding this comment.
Hardcoding minor package version strings creates immediate maintenance overhead because the URLs break or point to stale API specs when the requirements are upgraded!
…er than conditional case-based query
| diff = np.zeros(mins_per_day + 1, dtype=float) | ||
|
|
||
| for i, schedule in enumerate(schedules): | ||
| if not schedule.frequency_is_supported(): |
There was a problem hiding this comment.
If I understood correctly, this method evaluates many potential solutions and this check is done for all of them. I think it's better to do this filtering one time before calling this method and only sending the supported ones here.
There was a problem hiding this comment.
The issue with that is that you'd have to remove the schedule from start_times too since this is not tied to the schedule and is instead passed in as a separate sequence. Don't see much benefit to pre-filtering as this is an internal method to the class that is called so the function calls couldn't be minimised
There was a problem hiding this comment.
Use one leading underscore for non-public methods!
| def get_env_vars(): | ||
| """get_env_vars""" | ||
|
|
||
| pytest.cicada_home = os.environ.get("CICADA_HOME") |
There was a problem hiding this comment.
Storing arbitrary attributes on the pytest module is an anti-pattern. If you ever run tests concurrently (using pytest-xdist), it can cause race conditions or strange scope leaks.
|
|
||
| def test_get_schedules_per_server_no_schedules_single_server(self, db_setup): | ||
| """Test that _get_schedules_per_server raises ValueError when no schedules exist for a server""" | ||
| try: |
There was a problem hiding this comment.
You wrap the entire test body in a blanket try/except
This adds visual noise. Pytest handles unhandled exceptions natively and automatically generates rich traceback logs. Catching and re-raising general exceptions serves no functional purpose here.
|
|
||
| def test_main_no_schedules_single_server(self, db_setup, capsys): | ||
| """Test that main() handles servers without schedules gracefully (single server)""" | ||
| try: |
There was a problem hiding this comment.
Context
SmartScheduling Feature: Genetic Algorithm-Based Schedule Load Distribution
Summary
Implements SmartScheduling, a genetic algorithm-based optimization module that automatically distributes scheduled jobs across a 24-hour period to minimize resource conflicts and peak CPU load. This is the core feature for Cicada's ability to optimize job scheduling across distributed nodes.
The branch includes:
Problem
On distributed job schedulers, all jobs on a server might naturally cluster at similar times (e.g., on the hour, at :00 or :30 minutes), causing resource spikes. Cicada needed a way to automatically shift job start times to spread load evenly across the day.
Solution
SmartScheduling uses a Genetic Algorithm to find near-optimal shift values for each job:
Key Features
Architecture (
cicada/lib/SmartScheduling/)domain.py:Scheduledataclass representing a schedulable job with cron parsingconfig.py:GAConfighyperparameters for the genetic algorithmpygad.py: PyGAD wrapper with fitness function (evaluates resource contention)evaluation.py: load calculation and peak detectionCommands
smart_schedule.py: Main orchestrator; runs GA optimization and updates DBsmart_schedule_rollback.py: Reverts optimization using checkpoint historyblocklist.py: Allows adding/removoing of Schedules from the blocklist (preventing them from being optimised)Database Changes
schedule_backups: Stores optimization snapshots with checkpoints for rollbacksnapshots: Stores snapshot metadata (snpashot_timestamp, server affected etc.)Documentation
docs/Smart Scheduler Technical Overview.md- Algorithm details, hyperparameters, tuning guideCLAUDE.mdupdated with SmartScheduling development guidanceChecklist