Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
c9bbe11
Add functionality to get all schedules per server
naomiwise Mar 31, 2026
ddf3333
Create taps objects for each schedule
naomiwise Mar 31, 2026
d547bb3
Get pygad to work with dummy taps
naomiwise Mar 31, 2026
9433d71
Create script to mimic current PPW taps locally and remove schedules …
naomiwise Mar 31, 2026
e3bcb84
Add functionality for updating schedules produced by smart_schedule
naomiwise Mar 31, 2026
dcd49fa
Add rollback functionality and table
naomiwise Apr 2, 2026
66c9559
Start GA from last checkpoint instead of restarting each time
naomiwise Apr 3, 2026
03d5e12
Allow GAConfig parameters to be passed in and change logic to how sta…
naomiwise Apr 3, 2026
aaea717
Add tests
naomiwise Apr 7, 2026
a37674d
Change default config to testing values
naomiwise Apr 9, 2026
559cdaf
Add blacklist and enhance rollback functionality
naomiwise Apr 9, 2026
905e06d
Add smart scheduling docs and claude config
naomiwise Apr 17, 2026
5255bed
Remove superfluous tap discretization (chunking minutes into configur…
naomiwise Apr 17, 2026
02ed343
Add better error detection and allow delete schedule to remove from s…
naomiwise Apr 17, 2026
c1d37ff
Populate schedule_backups upon creation
naomiwise Apr 20, 2026
e5930fc
Change locations in docker
naomiwise Apr 20, 2026
01e9968
Refactor shift and start_time_shift_mins
naomiwise Apr 20, 2026
9b15b81
Add unsupported taps into pygad eval
naomiwise Apr 20, 2026
1679e61
Changes how median runtime is set. Doesn't disregard new taps as an e…
naomiwise Apr 20, 2026
124f797
Address Copilot comments - SQL Injection, GA Config etc.
naomiwise Apr 20, 2026
4a54c36
Change IPs in dev setup
naomiwise Apr 20, 2026
c9d0d88
Change naming for blocklist
naomiwise May 1, 2026
19ef0da
Change Tap terminology to Schedule
naomiwise May 1, 2026
c75223d
Rollback renaming and refactoring
naomiwise May 5, 2026
e326e0c
Remove CPU references
naomiwise May 5, 2026
d962cf1
Change server_id in the backups table for spread_schedules
naomiwise May 11, 2026
122eee6
Reroute rollback in CLI
naomiwise May 19, 2026
a6d7f62
Refactor schedule backups -> adds a separate column for the smart_int…
naomiwise May 19, 2026
92c43f6
Add bulk update for schedules to reduce the amount of db roundtrips t…
naomiwise May 19, 2026
615fb84
Change snapshotting mechanism and change rollback functionality accor…
naomiwise May 20, 2026
1065193
Addressing Copilot comments
naomiwise May 20, 2026
ee4e704
Add blocklist command and rollback improvements
naomiwise May 21, 2026
a983e6d
fix incorrect rollback routing
naomiwise May 21, 2026
18349ef
Add Smart Schedule Tests
naomiwise May 21, 2026
5b54cd0
Remove surperfluous blocklist schedule parameter in PyGAD and prevent…
naomiwise May 22, 2026
466947f
Switch blocklist add/remove functionality and remove reset_schedule_b…
naomiwise May 22, 2026
65a969d
Fix spread_schedules and increase snapshots stored per server to 5
naomiwise May 22, 2026
502d741
Add CLI tests for smart scheduling commands
naomiwise May 22, 2026
1ace70e
Addressing PR comments
naomiwise May 22, 2026
a1a44c8
Addressing PR comments
naomiwise May 26, 2026
fa60b1f
Rename pygad -> GAPyGAD
naomiwise May 27, 2026
b589fb2
Replace genespace with dictionary
naomiwise May 27, 2026
3cb815c
Change bulk update to chain multiple insert statements togerther rath…
naomiwise May 27, 2026
caa7599
Fix cleanup queries and add tests for it
naomiwise May 27, 2026
1a36f3b
Updates to reset_schedule_backsup and snapshot_schedules
naomiwise May 27, 2026
2380c86
Remove dataclass decorator from Schedule class
naomiwise May 28, 2026
cbe43aa
Add test for bulk schedule update
naomiwise May 28, 2026
843eb6c
Remove snapshot cleanup for schedule-based rollback
naomiwise May 28, 2026
4994072
SmartScheduling -> smart_scheduling
naomiwise May 28, 2026
962e0d4
Change type() -> isinstance()
naomiwise Jun 1, 2026
ef9412d
Change test to use and rename ga_pygad
naomiwise Jun 1, 2026
bc47dd5
Increment version from 0.9.0 to 0.10.0
naomiwise Jun 15, 2026
5c8e6ff
update changelog
naomiwise Jun 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
0.10.0
-----
- Add smart_schedule command with optimise and rollback options (as well as blocklist functionality)
- Adds new column to existing table and new tables connected to smart_schedule command

0.9.0
-----
- Verify compatibility with Ubuntu 22.04
Expand Down
182 changes: 182 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

**Cicada** is a centralized, distributed job scheduler for Pipelinewise schedules. It acts as a lightweight management layer between Linux CRON and executables, allowing jobs to be scheduled across multiple nodes via a central database rather than local cron.

Key architectural concepts:
- **Nodes/Servers**: Machines that register with Cicada and pull scheduling information from the central database. They execute `cicada exec_server_schedules` via cron.
- **Schedules**: Jobs defined in the database with cron expressions, parameters, and target servers.
- **SmartScheduling**: A Genetic Algorithm (GA) optimization module that shifts job start times to distribute load across a 24-hour period, avoiding resource conflicts.

## Development Setup

### Install and Build
```bash
make dev # Create venv with dev dependencies (black, flake8, pytest)
make # Create venv with only production dependencies
```

The project uses a standard Python venv setup. The `Makefile` is the single source of truth for build commands.

### Run Tests
```bash
make pytest # Run all tests with coverage (must be ≥80%)
```

To run a single test file or specific test:
```bash
. venv/bin/activate
pytest tests/test_lib_scheduler.py -v
pytest tests/test_lib_scheduler.py::test_function_name -v
```

### Code Quality
```bash
make flake8 # Lint (checks E9, F63, F7, F82 only, max line length 120)
make black # Format check (line length 120)
```

Black is used for code style; run it with `black --line-length 120 cicada/ tests/ --diff` to preview changes before committing.

## Codebase Structure

### Core Modules

**`cicada/lib/scheduler.py`**
- Central scheduling logic: retrieving schedules, managing execution state, cron parsing
- Functions like `get_schedule_details()`, `get_all_schedule_ids_per_server()`, `get_server_id()`
- Uses `croniter` for cron expression parsing
- Contains SQL queries for the main `schedules` and `servers` tables

**`cicada/lib/postgres.py`**
- Database connection management and helpers
- Connection pooling and statement execution

**`cicada/lib/utils.py`**
- Utility functions and decorators for exception handling and logging

**`cicada/cli.py`**
- Command dispatcher using argparse
- Routes subcommands to handlers in `cicada/commands/`

### Commands
Commands are located in `cicada/commands/` and implement specific operations:
- `exec_server_schedules.py` – Main loop executed by cron on each node; fetches and runs scheduled jobs
- `upsert_schedule.py`, `show_schedule.py`, `delete_schedule.py` – CRUD operations on schedules
- `smart_schedule.py` – Invokes GA optimization (see SmartScheduling below)
- `spread_schedules.py` – Distributes schedules across servers
- `rollback.py` – Reverts SmartScheduling changes using checkpoint history
- `register_server.py`, `archive_schedule_log.py`, `ping_slack.py` – Administrative operations

### SmartScheduling Module
Located in `cicada/lib/SmartScheduling/`

**`domain.py`**
- `Schedule` dataclass: represents a schedule as a "schedule" (job) with properties:
- `schedule_id`, `server_id`, `interval_mask` (cron expression)
- `frequency_minutes`, `median_runtime_minutes`
- `shift`: offset in minutes applied to shift job start time
- `blocklisted`: flag to exclude from GA optimization

**`config.py`**
- `GAConfig` dataclass: hyperparameters for the genetic algorithm
- `num_generations`, `sol_per_pop`, `mutation_percent_genes`, etc.

**`pygad.py`**
- Wraps the external `pygad` library (genetic algorithm)
- Fitness function: evaluates how well a shift assignment distributes load
- Implements crossover and mutation operations on shifts

**`evaluation.py`**
- Scoring logic: calculates resource contention, overlap penalties, and fitness metrics

### Database Schema
Key tables:
- `servers` – Registered nodes with hostname, FQDN, IP address
- `schedules` – Job definitions with cron expressions, parameters, execution state
- `schedule_logs` – Historical execution records with runtime, status, output
- `snapshots` – Metadata about optimization/rollback events (reason, timestamp, server_id)
- `schedule_backups` – Schedule state snapshots: stores `interval_mask` and `smart_interval_mask` at each snapshot for potential rollback
- `schedule_changes` – Linked-list audit trail of all changes to schedules (replaces older snapshot model); each entry has `previous_change_id` for chain traversal, `changes_delta` (JSON) for what changed

Database setup SQL is in `setup/db_and_user.sql` and `setup/schema.sql`. Migration script: `setup/migrate_snapshots_to_changes.sql`. Example schedule setup for smart scheduling in `setup/create_test_tap_setup`.

## Key Architectural Patterns

### Cron Expression Handling
- All scheduling uses standard cron format (5 fields: minute hour dom month dow)
- `croniter` library parses expressions and calculates next/previous execution times

### Command Execution
- Jobs are executed as shell commands by `exec_server_schedules`
- Commands can include parameters via template substitution
- Outputs and exit codes are logged to `schedule_logs` table

### Configuration
- Database connection details from `config/definitions.yml` (user must create from `config/example.yml`)
- Each command may accept CLI flags (e.g., `--schedule_id`, `--adhoc_execute`)

### SmartScheduling Workflow
1. **Load schedules**: Fetch all schedules for a server via `get_schedules_per_server()`
2. **Create Schedule objects**: Convert schedule details to Schedule instances; filter unsupported schedules (irregular cron, too frequent, blocklisted)
3. **Run GA optimization**: PyGAD evolves shifts over N generations to minimize resource conflicts
4. **Apply and checkpoint**: Save optimized shifts back to DB; record change entry via `record_schedule_change()` for audit trail and rollback

### Rollback System
Cicada supports two rollback mechanisms:

**Full Rollback** (`--full` flag):
- Sets `smart_interval_mask = NULL` for affected schedules, reverting to original `interval_mask`
- Works per-schedule or per-server
- Records a `ROLLBACK_FULL` change entry in `schedule_changes`

**Rollback to Specific Change** (`--change-id` flag):
- Uses linked-list traversal via `compute_schedule_state_at_change()` to reconstruct schedule state at any historical change
- Requires `schedule_id` and `change_id`
- Records a `ROLLBACK_TO_CHANGE` entry documenting what was restored
- Marks the target change as reverted

**Change History** (`--history` flag):
- Displays complete audit trail for a schedule via `get_schedule_history()`
- Each entry shows reason, timestamp, and delta (what changed)

**Migration Note**: Old snapshot/schedule_backups model supported only last 3 snapshots. New `schedule_changes` model retains unlimited history via linked-list structure.

## Testing

Tests are in `tests/` and use `pytest` with fixtures:
- `test_functional_main.py` – Integration tests for the main execution loop
- `test_functional_cli_entrypoint.py` – CLI command tests
- `test_functional_spread_schedules.py` – SmartScheduling and load distribution tests
- `test_lib_scheduler.py` – Unit tests for scheduler utility functions
- `test_lib_postgres.py` – Database connection tests

Mock fixtures often include a test PostgreSQL database or in-memory alternatives. Freezegun is used for time-based testing.

## Common Development Tasks

### Adding a New CLI Command
- Create a new file in `cicada/commands/` with a `main()` function
- Import and add an entry point in `cicada/cli.py`
- Add tests in `tests/test_functional_cli_entrypoint.py`

### Modifying Schedule Logic
- Edit `cicada/lib/scheduler.py` for core logic changes (e.g., new state transitions)
- Update `cicada/lib/SmartScheduling/domain.py` if Schedule validation rules change
- Update tests in `test_lib_scheduler.py` to cover new behavior

### Database Schema Changes
- Modify SQL in `setup/schema.sql` (note: existing deployments require migration scripts)
- Update query strings in `scheduler.py` and corresponding test fixtures

## Important Notes

- **PostgreSQL only**: Only PostgreSQL is supported (versions 12.9–15.14 verified)
- **No external APIs**: Uses only core Python and database; runs offline
- **Cron safety**: Jobs execute only when registered server node is running; they respect cron expressions and database state
- **Rollback support**: SmartScheduling changes can be rolled back via checkpoints stored in the database
- **Line length**: Maximum 120 characters (enforced by Black and Flake8)
- **Code coverage**: Must maintain ≥80% test coverage for commits
131 changes: 131 additions & 0 deletions cicada/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from cicada.lib import utils

from cicada.commands import register_server
from cicada.commands import smart_schedule_rollback
from cicada.commands import list_server_schedules
from cicada.commands import exec_server_schedules
from cicada.commands import upsert_schedule
Expand All @@ -18,6 +19,8 @@
from cicada.commands import ping_slack
from cicada.commands import list_schedules
from cicada.commands import delete_schedule
from cicada.commands import smart_schedule
from cicada.commands import blocklist_schedule as blocklist_schedule_cmd


@utils.named_exception_handler("Cicada")
Expand All @@ -29,6 +32,7 @@ def __init__(self):
"register_server",
"list_server_schedules",
"exec_server_schedules",
"smart_schedule",
"show_schedule",
"upsert_schedule",
"exec_schedule",
Expand Down Expand Up @@ -273,6 +277,133 @@ def delete_schedule():
args = parser.parse_args(sys.argv[2:])
delete_schedule.main(args.schedule_id)

@staticmethod
def smart_schedule():
"""Generate smart schedules for a server using genetic algorithm, or rollback/manage blocklist"""
parser = argparse.ArgumentParser(
allow_abbrev=False,
add_help=True,
prog=inspect.stack()[0][3],
description="Generate smart schedules for a server using genetic algorithm, or rollback previous changes, or manage schedule blocklist",
)

# Subcommands: optimise, rollback, blocklist
subparsers = parser.add_subparsers(dest="action", help="Action to perform. Options: optimise (default), rollback, or blocklist")

# (Default) optimise subcommand
optimise_parser = subparsers.add_parser(
"optimise",
help="optimise schedules using genetic algorithm",
add_help=True,
)
optimise_parser.add_argument("--server_id", type=int, required=False, help="ID of the server")

# Optional GA Configurations
ga_config = optimise_parser.add_argument_group("ga_config", "Optional configurations for the genetic algorithm optimiser")
ga_config.add_argument("--num_generations",type=int,required=False, help="Number of generations for the genetic algorithm. Default: 20")
ga_config.add_argument("--sol_per_pop",type=int,required=False, help="Number of solutions per population for the genetic algorithm. Default: 40")
ga_config.add_argument("--num_parents_mating",type=int,required=False, help="Number of parents mating for the genetic algorithm. Default: 10")
ga_config.add_argument("--mutation_percent_genes",type=int,required=False, help="Mutation percentage of genes for the genetic algorithm. Default: 20")
ga_config.add_argument("--parent_selection_type",type=str,required=False, help="Parent selection type for the genetic algorithm. Allowed values: ['sss', 'rws', 'sus', 'tournament', 'rank', 'random']. Default: rank")
ga_config.add_argument("--crossover_type",type=str,required=False, help="Crossover type for the genetic algorithm. Allowed values: ['single_point', 'two_point', 'uniform']. Default: uniform")
ga_config.add_argument("--mutation_type",type=str,required=False, help="Mutation type for the genetic algorithm. Allowed values: ['random', 'swap', 'inversion', 'scramble']. Default: random")
ga_config.add_argument("--keep_elitism",type=int,required=False, help="Number of elite solutions to keep for the next generation. Default: 2")
ga_config.add_argument("--random_seed",type=int,required=False, help="Set a random seed to get repeatable results. Default: None")

# Rollback subcommand
rollback_parser = subparsers.add_parser(
"rollback",
help="Rollback to original or previous cron schedules",
add_help=True,
prog=inspect.stack()[0][3],
description="Rollback for smart scheduling, it resets the schedule to its original cron in case of any issues",
)

# Mutually exclusive flags for rollback mode
rollback_mode = rollback_parser.add_mutually_exclusive_group(required=True)
rollback_mode.add_argument(
"--full",
default=False,
action="store_true",
help="Rollback to original schedule (set smart_interval_mask to NULL)",
)
rollback_mode.add_argument(
"--previous",
default=False,
action="store_true",
help="Rollback to most recent snapshot (step back one optimization)",
)

# Add mutually exclusive arguments for rollback subcommand to specify either server_id or schedule_id for targeted rollback
group = rollback_parser.add_mutually_exclusive_group()
group.add_argument(
"--server_id",
type=int,
required=False,
help="ID of the server to rollback, if not specified will rollback all servers",
)
group.add_argument("--schedule_id", type=str, required=False, help="ID of the schedule to rollback")


# Blocklist subcommand
blocklist_parser = subparsers.add_parser(
"blocklist",
help="Add or remove a schedule from the blocklist (excluded from smart scheduling optimization)",
add_help=True,
)
blocklist_parser.add_argument(
"--schedule_id",
type=str,
required=True,
help="Id of the schedule to blocklist/unblocklist",
)
blocklist_parser.add_argument(
"--remove",
default=False,
action="store_true",
help="Remove the schedule from the blocklist instead of adding it",
)
blocklist_parser.add_argument(
"--reason",
type=str,
required=False,
help="Reason for blocklisting (optional, only used when adding)",
)

# Parse arguments and call smart_schedule.main with appropriate arguments based on subcommand
args = parser.parse_args(sys.argv[2:])

if args.action == "optimise" or args.action is None:
optimise_args = optimise_parser.parse_args(sys.argv[3:])
smart_schedule.main(
server_id=optimise_args.server_id,
ga_config={
"num_generations": optimise_args.num_generations,
"sol_per_pop": optimise_args.sol_per_pop,
"num_parents_mating": optimise_args.num_parents_mating,
"mutation_percent_genes": optimise_args.mutation_percent_genes,
"parent_selection_type": optimise_args.parent_selection_type,
"crossover_type": optimise_args.crossover_type,
"mutation_type": optimise_args.mutation_type,
"keep_elitism": optimise_args.keep_elitism,
"random_seed": optimise_args.random_seed,
},
)
elif args.action == "rollback":
rollback_args = rollback_parser.parse_args(sys.argv[3:])
smart_schedule_rollback.main(
server_id=rollback_args.server_id,
schedule_id=rollback_args.schedule_id,
full=rollback_args.full,
previous=rollback_args.previous)
elif args.action == "blocklist":
blocklist_args = blocklist_parser.parse_args(sys.argv[3:])
blocklist_schedule_cmd.main(
schedule_id=blocklist_args.schedule_id,
remove=blocklist_args.remove,
reason=blocklist_args.reason,
)

@staticmethod
def version():
"""Return version of cicada package"""
Expand Down
Loading
Loading