fix(agent/triage): tiered cloud → retry → local → defer fallback#1367
Conversation
Cloud failures (429, 5xx, timeout) now retry once honouring Retry-After before falling through to a local arm; if both fail the run yields a TriageOutcome::Deferred so the caller can reschedule instead of hard- failing user-visible triggers. Fixes tinyhumansai#1257
📝 WalkthroughWalkthroughThis PR implements a tiered fallback chain for triage evaluation as specified in issue ChangesTriage Tiered Fallback with Deferred Outcomes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (7)
src/openhuman/webhooks/ops.rs (1)
183-183: ⚡ Quick winConsider extracting hardcoded timeout constant.
The 60-second timeout for triage and apply_decision is hardcoded here and also appears in
src/openhuman/webhooks/bus.rs(line 130). Extracting this to a named constant (e.g.,TRIAGE_TIMEOUT_SECS) would improve maintainability and make the timeout budget more discoverable.♻️ Optional refactor to extract timeout constant
At the top of the file:
+/// Timeout for triage evaluation and decision application (seconds). +const TRIAGE_TIMEOUT_SECS: u64 = 60; + use crate::api::config::effective_api_url;Then use it:
let outcome = tokio::time::timeout( - std::time::Duration::from_secs(60), + std::time::Duration::from_secs(TRIAGE_TIMEOUT_SECS), crate::openhuman::agent::triage::run_triage(&envelope), ) .await - .map_err(|_| "triage timed out after 60s".to_string())? + .map_err(|_| format!("triage timed out after {}s", TRIAGE_TIMEOUT_SECS))?Also applies to: 192-197
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/webhooks/ops.rs` at line 183, Replace the hardcoded std::time::Duration::from_secs(60) used for the triage and apply_decision timeouts with a named constant (e.g., TRIAGE_TIMEOUT_SECS) so the timeout is discoverable and reusable; define a const TRIAGE_TIMEOUT_SECS: u64 = 60 at a shared location (either at the top of src/openhuman/webhooks/ops.rs or in a common module if you also update src/openhuman/webhooks/bus.rs) and then construct durations with std::time::Duration::from_secs(TRIAGE_TIMEOUT_SECS) wherever triage or apply_decision timeouts are set.src/openhuman/agent/triage/evaluator.rs (4)
1-519: ⚖️ Poor tradeoffHeads up: file is now ~519 lines, just over the ~500-line guideline.
Doc-comment, classification helpers (
classify_error,is_transient_string), and the new fallback driver have pushedevaluator.rspast the soft cap. A natural split isclassify.rs(theArmError,classify_error,is_transient_string,now_mshelpers) leaving the driver,try_arm, and prompt rendering here. Not a blocker for this PR.As per coding guidelines: "Keep source files to ≤ ~500 lines per file; split modules when growing larger."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/triage/evaluator.rs` around lines 1 - 519, The file exceeds the ~500-line guideline; split classification helpers into a new module: extract the ArmError enum and the functions classify_error, is_transient_string, and now_ms into a new classify.rs, leaving try_arm, run_triage*, prompt rendering and driver code in evaluator.rs; in evaluator.rs replace the moved definitions with "use crate::openhuman::agent::triage::classify::{ArmError, classify_error, is_transient_string, now_ms};" (or equivalent path), add a mod classify; to keep tests/builds working update any imports/re-exports and the module tree (lib.rs or mod file) so the new classify.rs is compiled. Ensure visibility (pub as needed) matches original usage and run unit tests.
188-208: ⚡ Quick winCloud retry-exhaustion error is dropped before the no-local Deferred path.
When the second cloud attempt returns
ArmError::Retryable, thesource: anyhow::Erroris discarded (thetracing::warn!at 190-193 doesn't even include it), and iflocalisNonethe Deferred at 204-207 reports only"cloud retry exhausted; local arm unavailable". That makes operator-visible deferrals on stuck cloud upstreams effectively opaque — a 5xx and a connection reset look identical in the wake-up reason. Capture the second-attempt error and thread it into both the warn log and the Deferredreason:♻️ Proposed fix
match try_arm(&cloud, envelope, TriageResolutionPath::CloudAfterRetry).await { Ok(run) => return Ok(TriageOutcome::Decision(run)), Err(ArmError::Fatal(err)) => return Err(err), - Err(ArmError::Retryable { .. }) => { - // Exhausted cloud budget — fall through to local. - tracing::warn!( - "[triage::evaluator] cloud retry budget exhausted; \ - falling back to local arm" - ); - } + Err(ArmError::Retryable { source, .. }) => { + // Exhausted cloud budget — fall through to local. + tracing::warn!( + error = %source, + "[triage::evaluator] cloud retry budget exhausted; \ + falling back to local arm" + ); + cloud_failure = Some(source); + } } } } @@ - let Some(local) = local else { + let Some(local) = local else { // No local arm available at all (runtime disabled, no model // configured) — the only honest outcome is a deferral so the // next tick retries the whole chain. + let cause = cloud_failure + .map(|e| e.to_string()) + .unwrap_or_else(|| "unknown".to_string()); return Ok(TriageOutcome::Deferred { defer_until_ms: now_ms().saturating_add(DEFER_WAKEUP_MS), - reason: "cloud retry exhausted; local arm unavailable".to_string(), + reason: format!( + "cloud retry exhausted ({cause}); local arm unavailable" + ), }); };(plus a
let mut cloud_failure: Option<anyhow::Error> = None;near the top ofrun_triage_with_arms).The existing test
no_local_arm_returns_deferred_after_cloud_exhaustionassertsreason.contains("local arm unavailable"), which the suggestion preserves.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/triage/evaluator.rs` around lines 188 - 208, In run_triage_with_arms, capture the Retryable error from the second cloud attempt instead of dropping it: introduce a let mut cloud_failure: Option<anyhow::Error> = None and when matching Err(ArmError::Retryable { source, .. }) assign cloud_failure = Some(source); update the tracing::warn! call that currently logs "[triage::evaluator] cloud retry budget exhausted; falling back to local arm" to include the captured error (if any), and when returning TriageOutcome::Deferred for the no-local path, append or interpolate the cloud_failure's error message into the reason string (preserving the existing "local arm unavailable" text so tests like no_local_arm_returns_deferred_after_cloud_exhaustion still pass).
214-231: ⚡ Quick win
cloud + local both failedreason only carries the local error.The
format!("cloud + local both failed: {err}")interpolates the captured local arm error, but the cloud error was already dropped at the cloud-exhaustion site above. The wording promises both, the value contains one. Combined with the fix suggested for lines 188-208, you can include both causes here:♻️ Proposed fix
- Err(ArmError::Fatal(err)) | Err(ArmError::Retryable { source: err, .. }) => { - // Local also failed — defer rather than surface a hard - // error. Today's "hard fail" is the wrong default for a - // transient blocker per `#1257`. - let reason = format!("cloud + local both failed: {err}"); + Err(ArmError::Fatal(err)) | Err(ArmError::Retryable { source: err, .. }) => { + // Local also failed — defer rather than surface a hard + // error. Today's "hard fail" is the wrong default for a + // transient blocker per `#1257`. + let cloud_cause = cloud_failure + .map(|e| e.to_string()) + .unwrap_or_else(|| "<not captured>".to_string()); + let reason = format!("cloud failed ({cloud_cause}); local failed: {err}");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/triage/evaluator.rs` around lines 214 - 231, The current Deferred reason only includes the local arm error; capture and propagate the cloud arm error from the cloud-exhaustion site so the LocalFallback match can include both causes: adjust the cloud-exhaustion code (the earlier try_arm call/handling around TriageResolutionPath::Cloud) to preserve the cloud error into a named variable (e.g., cloud_err) and return or attach it inside the ArmError/Retryable path so the later match on try_arm(&local, ..., TriageResolutionPath::LocalFallback) can access both errors; then build the reason as something like format!("cloud + local both failed: cloud: {cloud_err}; local: {err}") before emitting the tracing::warn and returning TriageOutcome::Deferred (use the existing symbols try_arm, ArmError::{Fatal, Retryable}, TriageOutcome::Deferred, now_ms(), and DEFER_WAKEUP_MS).
404-429: 💤 Low value
is_transient_stringis sensible; one minor 5xx tokenization edge.The digit-tokenization loop at 421-427 will also flip to
truefor any unrelated number that happens to land in[500, 600)— e.g., a port literal:5432won't match (out of range) but:502would, and request IDs likereq-573-…would too. In practice agent-bus error strings are well-formed enough that this is unlikely to matter, but anchoring the digits to typical HTTP-status shapes (e.g.,HTTP 5xx,status 5xx,5xx) would make the heuristic less likely to drift over time. Optional.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/triage/evaluator.rs` around lines 404 - 429, The digit-tokenization in is_transient_string is too broad and treats any numeric token in 500..600 as a transient HTTP 5xx; narrow this by only accepting 5xx matches that are anchored to HTTP/status-style contexts or whole-word HTTP codes. In practice, update the digit-tokenization loop in is_transient_string (the block that iterates over lower.split(...) and parses tokens) to instead search for 5xx using a stricter rule — e.g., use a regex or explicit checks that require a surrounding word boundary (\\b5\\d{2}\\b) or the presence of nearby cues like "http", "status", "code", or "error" before/around the digits — so random numeric IDs/ports (e.g., req-573 or :5432) no longer trigger true. Ensure the function continues to return true for genuine 5xx indications and false otherwise.src/openhuman/agent/triage/routing.rs (2)
92-105: 💤 Low valueLocal-arm
label/provider_nameis misleading for non-llamacpp + override combos.When
OPENHUMAN_LOCAL_INFERENCE_URLis set,use_openai_compatbecomestrueregardless ofprovider_kind, but the label collapses to"llamacpp"for everything that is not"custom_openai"— including"ollama","llama-server", or unknown values. Thatlabelis then surfaced asprovider_nameinResolvedProviderand in thetracing::debug!line below, so an Ollama-with-override deployment will show up in logs/telemetry asllamacpp, which makes the newresolution_pathinstrumentation harder to trust on degraded turns.Consider preserving the configured kind (or at least distinguishing
llama-serverfromllamacpp):♻️ Proposed fix
- let (label, base) = if use_openai_compat { - let base = override_base - .or_else(|| local_cfg.base_url.clone()) - .unwrap_or_else(|| "http://127.0.0.1:8080/v1".to_string()); - let label = if provider_kind == "custom_openai" { - "custom_openai" - } else { - "llamacpp" - }; - (label, base) - } else { + let (label, base) = if use_openai_compat { + let base = override_base + .or_else(|| local_cfg.base_url.clone()) + .unwrap_or_else(|| "http://127.0.0.1:8080/v1".to_string()); + let label: &str = match provider_kind.as_str() { + "custom_openai" => "custom_openai", + "llama-server" => "llama-server", + "llamacpp" => "llamacpp", + _ => "openai_compat", + }; + (label, base) + } else {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/triage/routing.rs` around lines 92 - 105, The current branch that sets (label, base) when use_openai_compat is true collapses every non-"custom_openai" provider_kind into "llamacpp", causing mislabeling in ResolvedProvider/tracing; change the logic in the block that assigns label (the let (label, base) = ... expression) so that label preserves the original provider_kind (or maps only "custom_openai" → "custom_openai") instead of defaulting to "llamacpp" — e.g., set label = provider_kind.clone() or match provider_kind to explicitly preserve "ollama", "llama-server", "llamacpp", etc.; keep the base selection behavior using override_base.or_else(...) as-is.
81-105: 💤 Low valueDocument the
OPENHUMAN_LOCAL_INFERENCE_URLcontract to clarify whether the/v1path component is required.
OpenAiCompatibleProvider::newonly trims trailing slashes and does not append/v1itself. Thechat_completions_url()method (src/openhuman/providers/compatible.rs:246–264) appends/chat/completionsonly if the base URL doesn't already contain that endpoint, but it never normalizes or auto-appends/v1. This differs from the Ollama branch, which explicitly appends/v1, and the hard-coded default, which includes it. If a user setsOPENHUMAN_LOCAL_INFERENCE_URL=http://host:8080without the/v1prefix, the client will POST tohttp://host:8080/chat/completionsinstead of the expected/v1/chat/completions, causing triage failures. Either enforce the/v1suffix in code or document that the env var must include it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/triage/routing.rs` around lines 81 - 105, The env var OPENHUMAN_LOCAL_INFERENCE_URL is currently only trimmed of trailing slashes in the routing logic (see override_base and use_openai_compat) but the code path for OpenAiCompatibleProvider (OpenAiCompatibleProvider::new and chat_completions_url) does not normalize or append a /v1 segment, causing mismatched endpoints compared to the ollama branch; fix by normalizing override_base to ensure it always ends with "/v1" (unless it already contains a path like "/v1" or "/v1/") before it is used or passed into OpenAiCompatibleProvider, so that local_cfg.base_url/or override_base yields a base that will produce requests to /v1/chat/completions consistently across use_openai_compat and ollama branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/openhuman/agent/triage/evaluator.rs`:
- Around line 1-519: The file exceeds the ~500-line guideline; split
classification helpers into a new module: extract the ArmError enum and the
functions classify_error, is_transient_string, and now_ms into a new
classify.rs, leaving try_arm, run_triage*, prompt rendering and driver code in
evaluator.rs; in evaluator.rs replace the moved definitions with "use
crate::openhuman::agent::triage::classify::{ArmError, classify_error,
is_transient_string, now_ms};" (or equivalent path), add a mod classify; to keep
tests/builds working update any imports/re-exports and the module tree (lib.rs
or mod file) so the new classify.rs is compiled. Ensure visibility (pub as
needed) matches original usage and run unit tests.
- Around line 188-208: In run_triage_with_arms, capture the Retryable error from
the second cloud attempt instead of dropping it: introduce a let mut
cloud_failure: Option<anyhow::Error> = None and when matching
Err(ArmError::Retryable { source, .. }) assign cloud_failure = Some(source);
update the tracing::warn! call that currently logs "[triage::evaluator] cloud
retry budget exhausted; falling back to local arm" to include the captured error
(if any), and when returning TriageOutcome::Deferred for the no-local path,
append or interpolate the cloud_failure's error message into the reason string
(preserving the existing "local arm unavailable" text so tests like
no_local_arm_returns_deferred_after_cloud_exhaustion still pass).
- Around line 214-231: The current Deferred reason only includes the local arm
error; capture and propagate the cloud arm error from the cloud-exhaustion site
so the LocalFallback match can include both causes: adjust the cloud-exhaustion
code (the earlier try_arm call/handling around TriageResolutionPath::Cloud) to
preserve the cloud error into a named variable (e.g., cloud_err) and return or
attach it inside the ArmError/Retryable path so the later match on
try_arm(&local, ..., TriageResolutionPath::LocalFallback) can access both
errors; then build the reason as something like format!("cloud + local both
failed: cloud: {cloud_err}; local: {err}") before emitting the tracing::warn and
returning TriageOutcome::Deferred (use the existing symbols try_arm,
ArmError::{Fatal, Retryable}, TriageOutcome::Deferred, now_ms(), and
DEFER_WAKEUP_MS).
- Around line 404-429: The digit-tokenization in is_transient_string is too
broad and treats any numeric token in 500..600 as a transient HTTP 5xx; narrow
this by only accepting 5xx matches that are anchored to HTTP/status-style
contexts or whole-word HTTP codes. In practice, update the digit-tokenization
loop in is_transient_string (the block that iterates over lower.split(...) and
parses tokens) to instead search for 5xx using a stricter rule — e.g., use a
regex or explicit checks that require a surrounding word boundary
(\\b5\\d{2}\\b) or the presence of nearby cues like "http", "status", "code", or
"error" before/around the digits — so random numeric IDs/ports (e.g., req-573 or
:5432) no longer trigger true. Ensure the function continues to return true for
genuine 5xx indications and false otherwise.
In `@src/openhuman/agent/triage/routing.rs`:
- Around line 92-105: The current branch that sets (label, base) when
use_openai_compat is true collapses every non-"custom_openai" provider_kind into
"llamacpp", causing mislabeling in ResolvedProvider/tracing; change the logic in
the block that assigns label (the let (label, base) = ... expression) so that
label preserves the original provider_kind (or maps only "custom_openai" →
"custom_openai") instead of defaulting to "llamacpp" — e.g., set label =
provider_kind.clone() or match provider_kind to explicitly preserve "ollama",
"llama-server", "llamacpp", etc.; keep the base selection behavior using
override_base.or_else(...) as-is.
- Around line 81-105: The env var OPENHUMAN_LOCAL_INFERENCE_URL is currently
only trimmed of trailing slashes in the routing logic (see override_base and
use_openai_compat) but the code path for OpenAiCompatibleProvider
(OpenAiCompatibleProvider::new and chat_completions_url) does not normalize or
append a /v1 segment, causing mismatched endpoints compared to the ollama
branch; fix by normalizing override_base to ensure it always ends with "/v1"
(unless it already contains a path like "/v1" or "/v1/") before it is used or
passed into OpenAiCompatibleProvider, so that local_cfg.base_url/or
override_base yields a base that will produce requests to /v1/chat/completions
consistently across use_openai_compat and ollama branches.
In `@src/openhuman/webhooks/ops.rs`:
- Line 183: Replace the hardcoded std::time::Duration::from_secs(60) used for
the triage and apply_decision timeouts with a named constant (e.g.,
TRIAGE_TIMEOUT_SECS) so the timeout is discoverable and reusable; define a const
TRIAGE_TIMEOUT_SECS: u64 = 60 at a shared location (either at the top of
src/openhuman/webhooks/ops.rs or in a common module if you also update
src/openhuman/webhooks/bus.rs) and then construct durations with
std::time::Duration::from_secs(TRIAGE_TIMEOUT_SECS) wherever triage or
apply_decision timeouts are set.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7f73ecb6-a181-4d8b-9015-c5ad1a2373ac
📒 Files selected for processing (11)
src/openhuman/agent/schemas.rssrc/openhuman/agent/triage/escalation.rssrc/openhuman/agent/triage/evaluator.rssrc/openhuman/agent/triage/evaluator_tests.rssrc/openhuman/agent/triage/mod.rssrc/openhuman/agent/triage/routing.rssrc/openhuman/composio/bus.rssrc/openhuman/notifications/rpc.rssrc/openhuman/providers/reliable.rssrc/openhuman/webhooks/bus.rssrc/openhuman/webhooks/ops.rs
Summary
Fixes #1257.
Replaces triage's single-attempt cloud-only routing with the tiered fallback chain the issue called for:
Non-transient cloud failures (auth, missing model, fatal parse error) bubble up immediately — the local arm wouldn't help.
Changes
Core (
src/openhuman/agent/triage/)evaluator.rs— rewritten aroundtry_arm(single-arm attempt + error classification) andrun_triage_with_arms(orchestrator). Adds:pub enum TriageResolutionPath { Cloud, CloudAfterRetry, LocalFallback }withas_str().pub struct TriageRun { …, resolution_path }(acceptance criteria Feat/landing revamp #3).pub enum TriageOutcome { Decision(TriageRun), Deferred { defer_until_ms, reason } }— surfaces deferral to callers (acceptance criteria Feat/gitbooks #1).classify_error+is_transient_stringhelpers that recognise 429, 5xx, timeouts, and connection errors. Reusesproviders::reliable::{is_rate_limited, is_upstream_unhealthy, parse_retry_after_ms}(visibility bumped topub(crate)).RETRY_AFTER_CAP, 500 msTRANSIENT_BACKOFF, 30sDEFER_WAKEUP_MSconstants.routing.rs— addsbuild_local_provider_with_config(&Config) -> Option<ResolvedProvider>mirroring the wiring inrouting::factory::new_provider's local arm (Ollama by default,OPENHUMAN_LOCAL_INFERENCE_URLoverridable). ReturnsNonewhen local AI is disabled or unconfigured, in which case the chain skips straight toDeferred.mod.rs— re-exportsTriageOutcome,TriageResolutionPath,build_local_provider_with_config.Caller updates (4 sites — all match
TriageOutcome)agent/schemas.rs—agent.run_triggerRPC:Deferredreturns a JSON shape withdecision: "deferred",defer_until_ms,reason;Decisionis unchanged but addsresolution_pathto the response payload.webhooks/ops.rs(webhooks.trigger_agent) — same pattern, plusresolution_pathon the success payload.webhooks/bus.rs(run_agent_triggersummary) — formatsTriage deferred until …: …for the bus reply on deferral.notifications/rpc.rs(notification triage) — logs deferral withtracing::warn!(the next ingest re-enters the chain on its own).composio/bus.rs— same logging-only deferral handling (composio bus owns no scheduler).Tests (acceptance criteria #4)
All five required scenarios are covered as
mock_agent_run_turnintegration tests inevaluator_tests.rs:happy_path_returns_cloud_resolutionrate_limited_then_ok_marks_cloud_after_retrydouble_429_falls_through_to_local_fallbackcloud_5xx_falls_through_to_local_fallbackcloud_then_local_failure_returns_deferredfatal_cloud_error_short_circuits_without_local_attemptno_local_arm_returns_deferred_after_cloud_exhaustionlocal: None→ DeferredPlus four pure classifier tests covering
classify_error(429 + Retry-After parsing, 5xx, timeout, auth-fatal).PR Submission Checklist
agent.run_triggerandwebhooks.trigger_agentresponses now includeresolution_path; new"deferred"decision shape withdefer_until_ms+reason. Additive — existing keys unchanged on theDecisionpath.TriageOutcome::Deferredarm is exercised by bothcloud_then_local_failure_returns_deferredandno_local_arm_returns_deferred_after_cloud_exhaustion.evaluator.rsdocuments the chain; rustdoc on each new public type.Acceptance criteria
triage/evaluator.rs.Retry-Afterhonored (capped at 30s).resolution_pathcarriescloud/cloud-after-retry/local-fallback;"deferred"shape is the deferred state.Related
JobOutcome::Defer(feat(memory_tree/jobs): JobOutcome::Defer + mark_deferred (#1256) #1345) — landed; theTriageOutcome::Deferredshape mirrors that contract for the triage caller surface.Retry-Afterdirectly from the upstream error string, the same shapeReliableProvider::compute_backoffalready handles.Summary by CodeRabbit
New Features
Bug Fixes