Skip to content

Triage tiered fallback: cloud → wait-and-retry → fallback-local → defer #1257

@sanil-23

Description

@sanil-23

Summary

Replace triage's current binary "cloud OR local-only" routing with a tiered fallback chain: cloud first → on 429 / transient failure wait-and-retry → on persistent failure fall through to local → on local failure defer the turn (via JobOutcome::Defer).

Problem

agent/triage/evaluator.rs currently picks one of cloud or local based on resolve_provider, runs the call, and surfaces any failure to the orchestrator. There's no graceful degradation:

  • 429 from the cloud → triage fails → user-visible toast.
  • Local model not loaded yet → triage fails the same way.
  • Cloud is hard-down → no automatic fallback to local even though the local arm is fully wired.

Every other tiered system (HTTP retry, mailgun, S3) has this fallback chain. Triage doesn't because the routing logic predates the local-arm gating from #1073.

Solution

In triage/evaluator.rs::evaluate:

  1. Cloud first when resolve_provider == cloud.
  2. On 429: cooperate with the cloud rate limiter (separate issue) — wait Retry-After, retry once. After the second 429, fall through.
  3. On transient cloud failure (timeout, 5xx, connection): retry once with backoff, then fall through.
  4. Fallback-local: acquire LlmPermit (already wired by Add throttling and power-aware execution for local LLM inference #1073) and run the same triage prompt against the local model. Mark the resolution path in the response so callers can tell it was a fallback.
  5. On local failure: return JobOutcome::Defer (separate issue) with a short wake-up so the next tick retries the whole chain. Today's "hard fail" is the wrong default for a transient blocker.

Path tracking matters: the orchestrator should know whether triage hit cloud, fell back to local, or deferred — so it can color-code degraded turns and surface the state in /debug views.

Acceptance criteria

  • Tiered fallback — cloud → retry → local → defer chain implemented in triage/evaluator.rs.
  • Cloud retry budget — at most 2 attempts (initial + 1 retry) before fallthrough; Retry-After honored.
  • Resolution path is observable — response carries which arm produced it (cloud / cloud-after-retry / local-fallback / deferred).
  • Tests — happy cloud, 429 → retry → ok, 429 → retry → 429 → local fallback, cloud 5xx → local fallback, local fail → defer.
  • Diff coverage ≥ 80% — implementing PR meets the changed-lines coverage gate (Vitest + cargo-llvm-cov, enforced by .github/workflows/coverage.yml).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions