Skip to content

Ap 2566 Smart Scheduling and Rollback Feature#163

Open
naomiwise wants to merge 51 commits into
mainfrom
AP-2566-smart-scheduler
Open

Ap 2566 Smart Scheduling and Rollback Feature#163
naomiwise wants to merge 51 commits into
mainfrom
AP-2566-smart-scheduler

Conversation

@naomiwise

@naomiwise naomiwise commented Apr 20, 2026

Copy link
Copy Markdown

Context

SmartScheduling Feature: Genetic Algorithm-Based Schedule Load Distribution

Summary

Implements SmartScheduling, a genetic algorithm-based optimization module that automatically distributes scheduled jobs across a 24-hour period to minimize resource conflicts and peak CPU load. This is the core feature for Cicada's ability to optimize job scheduling across distributed nodes.

The branch includes:

  • Complete GA optimization pipeline with PyGAD integration
  • Database schema and backup/rollback mechanism for optimization snapshots
  • CLI commands for running and managing smart scheduling
  • Documentation and Testing

Problem

On distributed job schedulers, all jobs on a server might naturally cluster at similar times (e.g., on the hour, at :00 or :30 minutes), causing resource spikes. Cicada needed a way to automatically shift job start times to spread load evenly across the day.

Solution

SmartScheduling uses a Genetic Algorithm to find near-optimal shift values for each job:

  • Represents each job as a "schedule" with frequency, runtime, and CPU requirements
  • Evolves shift assignments over generations to minimize peak CPU load
  • Supports job frequency ranges: 1-60 minutes, hourly, and daily
  • Automatically excludes blacklisted, irregular and system schedules from optimization
  • Creates checkpoints for safe rollback if optimization degrades performance

Key Features

Architecture (cicada/lib/SmartScheduling/)

  • domain.py: Schedule dataclass representing a schedulable job with cron parsing
  • config.py: GAConfig hyperparameters for the genetic algorithm
  • pygad.py: PyGAD wrapper with fitness function (evaluates resource contention)
  • evaluation.py: load calculation and peak detection

Commands

  • smart_schedule.py: Main orchestrator; runs GA optimization and updates DB
  • smart_schedule_rollback.py: Reverts optimization using checkpoint history
  • blocklist.py: Allows adding/removoing of Schedules from the blocklist (preventing them from being optimised)

Database Changes

  • schedule_backups: Stores optimization snapshots with checkpoints for rollback
  • snapshots: Stores snapshot metadata (snpashot_timestamp, server affected etc.)

Documentation

  • docs/Smart Scheduler Technical Overview.md - Algorithm details, hyperparameters, tuning guide
  • CLAUDE.md updated with SmartScheduling development guidance

Checklist

Copilot AI review requested due to automatic review settings April 20, 2026 14:55
@naomiwise naomiwise requested a review from a team as a code owner April 20, 2026 14:55
@platon-github-app-production

Copy link
Copy Markdown

Comment /request-review to automatically request reviews from the following teams:

You can also request review from a specific team by commenting /request-review team-name, or you can add a description with --notes "<message>"

💡 If you see something that doesn't look right, check the configuration guide.

@naomiwise naomiwise changed the title Ap 2566 Smart Scheduling Ap 2566 Smart Scheduling and Rollback Feature Apr 20, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Changes: New feature (1), Database change (1), Documentation update (1), Maintenance (1)

This PR introduces a “SmartScheduling” capability to Cicada, using a Genetic Algorithm to shift cron start times to reduce load spikes, along with database-backed checkpointing/rollback support and local-dev seed data.

Changes:

  • Add GA-based smart scheduling + rollback CLI commands and SmartScheduling module implementation.
  • Extend DB schema with schedule_backups + schedule_blacklist (and snapshot trigger) to support checkpointing/rollback.
  • Add docs (technical overview + diagrams) and local-dev SQL seeding; add new Python deps (numpy, pygad).

Reviewed changes

Copilot reviewed 15 out of 18 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
setup/schema.sql Adds snapshot trigger/function and new tables/indexes for backups + blacklist.
setup/create_test_tap_setup.sql Provides local/dev seed data for servers/schedules + blacklist examples.
setup.py Adds dependencies needed by the GA implementation.
local-dev/entrypoint.sh Loads the new seed SQL into local dev DB on startup.
docs/offspring-ga.png Diagram for GA offspring/crossover.
docs/genetic-algorithm-process-cycle.png Diagram for GA process cycle.
docs/Smart Scheduler Technical Overview.md Technical design/architecture documentation for SmartScheduling.
cicada/lib/scheduler.py Adds DB helpers for backups/blacklist + rollback/restore operations.
cicada/lib/SmartScheduling/config.py Defines GAConfig defaults.
cicada/lib/SmartScheduling/domain.py Introduces Tap domain object + cron-derived attributes.
cicada/lib/SmartScheduling/evaluation.py Adds fitness evaluation (CPU usage diff array + peak).
cicada/lib/SmartScheduling/pygad.py Implements GA solve loop using PyGAD.
cicada/commands/upsert_schedule.py Resets schedule backup baseline on upsert.
cicada/commands/smart_schedule.py New command to optimize schedules and write results/checkpoints.
cicada/commands/rollback.py New command to rollback schedules using schedule_backups.
cicada/commands/delete_schedule.py Deletes a schedule’s backup record as well.
cicada/cli.py Wires new commands into the CLI (partially).
CLAUDE.md Adds repository/architecture guidance including SmartScheduling overview.
Comments suppressed due to low confidence (2)

cicada/cli.py:44

  • test_functional_cli_entrypoint.py asserts the top-level cicada -h output, including the command list. Adding new commands here will change that output and will cause those CLI tests to fail unless they’re updated to include smart_schedule/rollback. Consider updating the test expectations (and adding coverage for the new subcommands’ -h output).
    def __init__(self):
        command_list = [
            "register_server",
            "list_server_schedules",
            "exec_server_schedules",
            "smart_schedule",
            "show_schedule",
            "upsert_schedule",
            "exec_schedule",
            "spread_schedules",
            "archive_schedule_log",
            "ping_slack",
            "list_schedule_ids",
            "delete_schedule",
            "version",
        ]

cicada/cli.py:44

  • rollback is implemented and imported, but it isn't listed in command_list. As a result cicada rollback ... will be rejected as an unrecognized command. Add rollback to command_list so it is discoverable and dispatchable like the other subcommands.
        command_list = [
            "register_server",
            "list_server_schedules",
            "exec_server_schedules",
            "smart_schedule",
            "show_schedule",
            "upsert_schedule",
            "exec_schedule",
            "spread_schedules",
            "archive_schedule_log",
            "ping_slack",
            "list_schedule_ids",
            "delete_schedule",
            "version",
        ]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cicada/lib/SmartScheduling/domain.py Outdated
Comment thread cicada/lib/scheduler.py Outdated
Comment thread setup/schema.sql Outdated
Comment thread cicada/lib/SmartScheduling/pygad.py Outdated
Comment thread cicada/commands/smart_schedule.py Outdated
Comment thread setup/create_test_tap_setup.sql Outdated
Comment thread docs/Smart Scheduler Technical Overview.md Outdated
Comment thread cicada/lib/scheduler.py Outdated
Comment thread cicada/lib/scheduler.py Outdated
Comment thread cicada/commands/smart_schedule.py Outdated
Comment thread cicada/commands/delete_schedule.py Outdated
naomiwise and others added 2 commits May 1, 2026 16:41
Co-authored-by: Copilot <copilot@github.com>
Comment thread docs/offspring_ga.png
Comment thread docs/smart_scheduler_technical_overview.md

**New Tables:**

- **`schedule_backups`** — Audit trail of schedule modifications

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this in scheduler.py :

sqlquery = """
        INSERT INTO schedule_backups (schedule_id, server_id, interval_mask, smart_interval_mask, snapshot_id)
        SELECT schedule_id, server_id, interval_mask, smart_interval_mask, %s
        FROM schedules WHERE schedule_id = ANY(%s)
    """

Comment thread docs/smart_scheduler_technical_overview.md
Comment thread cicada/lib/scheduler.py Outdated
sqlquery = f"UPDATE schedules SET {', '.join(case_clauses)} WHERE schedule_id = ANY(%s)"
db_cur.execute(sqlquery, tuple(params))

return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the use of this?

- **Mutation Type**: Random (randomly select genes and replace with random value from gene space)
- **Elitism**: Keep the best solution across generations (default: 1)

The creation of the offsprings uses different methods to change the solutions, however they must remain within the gene limits. For more information checkout the official [PyGAD documentation](https://pypi.org/project/pygad/5.3.0/) as it will be infinitely better than anything I can produce

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding minor package version strings creates immediate maintenance overhead because the URLs break or point to stale API specs when the requirements are upgraded!

Comment thread cicada/lib/scheduler.py
Comment thread cicada/lib/smart_scheduling/ga_pygad.py
Comment thread cicada/commands/smart_schedule_rollback.py Outdated
Comment thread cicada/commands/smart_schedule_rollback.py Outdated
Comment thread cicada/lib/smart_scheduling/config.py Outdated
diff = np.zeros(mins_per_day + 1, dtype=float)

for i, schedule in enumerate(schedules):
if not schedule.frequency_is_supported():

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, this method evaluates many potential solutions and this check is done for all of them. I think it's better to do this filtering one time before calling this method and only sending the supported ones here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with that is that you'd have to remove the schedule from start_times too since this is not tied to the schedule and is instead passed in as a separate sequence. Don't see much benefit to pre-filtering as this is an internal method to the class that is called so the function calls couldn't be minimised

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use one leading underscore for non-public methods!

Comment thread tests/test_smart_scheduling.py
def get_env_vars():
"""get_env_vars"""

pytest.cicada_home = os.environ.get("CICADA_HOME")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing arbitrary attributes on the pytest module is an anti-pattern. If you ever run tests concurrently (using pytest-xdist), it can cause race conditions or strange scope leaks.

Comment thread tests/test_smart_scheduling.py Outdated

def test_get_schedules_per_server_no_schedules_single_server(self, db_setup):
"""Test that _get_schedules_per_server raises ValueError when no schedules exist for a server"""
try:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrap the entire test body in a blanket try/except
This adds visual noise. Pytest handles unhandled exceptions natively and automatically generates rich traceback logs. Catching and re-raising general exceptions serves no functional purpose here.


def test_main_no_schedules_single_server(self, db_setup, capsys):
"""Test that main() handles servers without schedules gracefully (single server)"""
try:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants