Skip to content

Add task criticality levels to control panic recovery behavior #324

@Iamdavidonuh

Description

@Iamdavidonuh

Problem

The current install_panic_hook in TaskManager treats all panics equally and any thread panic triggers a full shutdown of the binary. This is correct for critical tasks like the server loop, but overly aggressive for non-critical background tasks (e.g., persistence flush, size calculation) where a panic could be logged and retried on the next interval.

Proposed Solution

Replace the global panic hook approach with per-task panic handling using std::panic::catch_unwind (via AssertUnwindSafe) inside spawn_task_loop and spawn_blocking.

  pub enum TaskCriticality {
      /// Panic triggers shutdown of all tasks (e.g., server loop)
      Critical,
      /// Panic is logged, task continues on next loop iteration (e.g., persistence, size calc)
      Recoverable,
  }
  • Critical tasks: on panic, cancel the CancellationToken to trigger graceful shutdown
  • Recoverable tasks: on panic, log the error and continue the loop (return TaskState::Continue)

This requires adding a criticality() method to the Task and BlockingTask traits (defaulting to Critical for safety).

Context

This came out of the papaya panic issue (#318). We added a global panic hook (#323) to prevent the binary from staying in limbo after a worker thread panic. That fix is correct as a baseline, but we should be more granular about which panics are actually fatal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions