Skip to content

Runtime: add auto-resume-until-clear mode for transient failures #50

@jafreck

Description

@jafreck

Problem

  • Operators currently rerun migrate manually after transient failures to progressively clear blocked tasks.

Proposal

  • Add an opt-in mode that periodically resumes from checkpoint with backoff until no transiently blocked tasks remain or a stop condition is met.
  • Respect retry budgets/circuit breakers and emit clear status summaries per cycle.

Acceptance Criteria

  • New CLI/runtime option enables automatic resume cycles.
  • Cycles stop on success, non-transient terminal failure, or max configured duration.
  • Summary output includes remaining blocked/failed tasks each cycle.
  • Add tests for control-flow and stop conditions.

Context reference

  • Blocked task count reduced over multiple manual resume runs (5 -> 4 -> 2), indicating automation opportunity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions