Skip to content

fix(kernel): preserve cron jobs across hand reactivation#1019

Open
letzdoo-js wants to merge 1 commit intoRightNow-AI:mainfrom
letzdoo-js:upstream-pr/cron-loss
Open

fix(kernel): preserve cron jobs across hand reactivation#1019
letzdoo-js wants to merge 1 commit intoRightNow-AI:mainfrom
letzdoo-js:upstream-pr/cron-loss

Conversation

@letzdoo-js
Copy link
Copy Markdown

Summary

Cron jobs are silently destroyed across daemon restarts for hand-style agents. Root cause: in activate_hand(), kill_agent(old.id) runs before the new agent is spawned. kill_agent
calls cron_scheduler.remove_agent_jobs() which deletes the agent's jobs from memory and persists [] to cron_jobs.json on disk. The reassign_agent_jobs() call further down (added by
#461 to fix exactly this class of bug) is therefore always a no-op — by the time it runs, the jobs are already gone.

Symptom: every daemon restart wipes cron_jobs.json for hand-managed agents. /api/cron/jobs returns empty. No error logged. Users report "my crons disappeared" with no explanation.

This PR fixes it by snapshotting the cron jobs into a local Vec<CronJob> before kill_agent, then re-adding them under the new agent ID after spawn_agent_with_parent — the exact
same pattern already used for saved_triggers immediately above (which fixed the analogous bug for triggers in #519).

Fixes the "cron jobs lost on restart" regression that #461's original fix attempted but did not actually resolve due to the operation ordering.

Changes

  • crates/openfang-kernel/src/kernel.rs — in activate_hand():
    • Snapshot saved_crons: Vec<CronJob> from cron_scheduler.list_jobs(old_id) before kill_agent (mirrors the existing saved_triggers snapshot 3 lines above)
    • After spawn_agent_with_parent, re-add each saved cron under the new agent_id, resetting next_run and last_run so jobs get a fresh schedule
    • Persist once after the bulk re-add
    • Existing reassign_agent_jobs() block kept as a defensive safety net (it's now redundant in the common path but harmless)
  • 41 insertions, 3 deletions, 1 file

Testing

  • cargo check -p openfang-kernel --lib passes clean against upstream/main (v0.5.7) — no warnings beyond the pre-existing imap-proto future-incompat note
  • cargo clippy --workspace --all-targets -- -D warnings passes — could not run locally (workspace requires gdk-3.0 system lib for an unrelated GUI crate); kernel crate alone is
    clean. CI will exercise the full workspace.
  • cargo test --workspace passes — same gdk-3.0 limitation; kernel-level unit tests for cron_scheduler.list_jobs and add_job already exist and the patch only uses public APIs that
    have those tests.
  • Live integration tested manually:
    1. Activate a hand agent
    2. POST /api/cron/jobs to create 3 jobs against it
    3. Verify /api/cron/jobs lists them and cron_jobs.json contains them
    4. docker restart the daemon
    5. Before this fix: /api/cron/jobs returns empty, cron_jobs.json is []
    6. After this fix: all 3 jobs still present and persisted, with the new agent UUID

Security

  • No new unsafe code
  • No secrets or API keys in diff
  • User input validated at boundaries — patch only manipulates already-validated CronJob instances loaded from the trusted CronScheduler; add_job() re-runs its existing validation
    (max-jobs limit, per-agent count, schedule validity) on every restored job

Bug: in activate_hand(), kill_agent() is called on the existing agent
BEFORE the new agent is spawned. kill_agent() invokes
cron_scheduler.remove_agent_jobs() which deletes all cron jobs from memory
AND persists [] to cron_jobs.json. The reassign_agent_jobs() call further
down was meant to migrate jobs from old to new (per RightNow-AI#461), but it always
runs as a no-op because the jobs are already gone — the order of
operations defeats the fix.

Symptom: every daemon restart silently destroys cron jobs for hand-style
agents. cron_jobs.json is rewritten as []. /api/cron/jobs returns empty.
No error message.

Fix: snapshot the cron jobs into a local Vec BEFORE kill_agent (same
pattern as saved_triggers above), then re-add them under the new agent_id
AFTER spawn_agent_with_parent. Runtime state (next_run, last_run) is
reset so jobs get a fresh start. The existing reassign_agent_jobs()
block is kept as a defensive safety net but is now redundant in the
common path.

Verified with cargo check -p openfang-kernel --lib (clean compile, no
warnings).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants