fix(agent/triage): tiered cloud → retry → local → defer fallback by obchain · Pull Request #1367 · tinyhumansai/openhuman

obchain · 2026-05-08T08:01:26Z

Summary

Replaces triage's single-attempt cloud-only routing with the tiered fallback chain the issue called for:

cloud (initial)
  ├── 429 / transient (5xx / timeout / connection) ──► retry once
  │       └── still failing ──► local fallback
  └── ok ──► resolution_path = Cloud | CloudAfterRetry

local fallback (acquires global LlmPermit per #1073)
  ├── ok ──► resolution_path = LocalFallback
  └── failed ──► TriageOutcome::Deferred { defer_until_ms, reason }

Non-transient cloud failures (auth, missing model, fatal parse error) bubble up immediately — the local arm wouldn't help.

Changes

Core (`src/openhuman/agent/triage/`)

evaluator.rs — rewritten around try_arm (single-arm attempt + error classification) and run_triage_with_arms (orchestrator). Adds:
- pub enum TriageResolutionPath { Cloud, CloudAfterRetry, LocalFallback } with as_str().
- pub struct TriageRun { …, resolution_path } (acceptance criteria Feat/landing revamp #3).
- pub enum TriageOutcome { Decision(TriageRun), Deferred { defer_until_ms, reason } } — surfaces deferral to callers (acceptance criteria Feat/gitbooks #1).
- classify_error + is_transient_string helpers that recognise 429, 5xx, timeouts, and connection errors. Reuses providers::reliable::{is_rate_limited, is_upstream_unhealthy, parse_retry_after_ms} (visibility bumped to pub(crate)).
- 30s RETRY_AFTER_CAP, 500 ms TRANSIENT_BACKOFF, 30s DEFER_WAKEUP_MS constants.
routing.rs — adds build_local_provider_with_config(&Config) -> Option<ResolvedProvider> mirroring the wiring in routing::factory::new_provider's local arm (Ollama by default, OPENHUMAN_LOCAL_INFERENCE_URL overridable). Returns None when local AI is disabled or unconfigured, in which case the chain skips straight to Deferred.
mod.rs — re-exports TriageOutcome, TriageResolutionPath, build_local_provider_with_config.

Caller updates (4 sites — all match `TriageOutcome`)

agent/schemas.rs — agent.run_trigger RPC: Deferred returns a JSON shape with decision: "deferred", defer_until_ms, reason; Decision is unchanged but adds resolution_path to the response payload.
webhooks/ops.rs (webhooks.trigger_agent) — same pattern, plus resolution_path on the success payload.
webhooks/bus.rs (run_agent_trigger summary) — formats Triage deferred until …: … for the bus reply on deferral.
notifications/rpc.rs (notification triage) — logs deferral with tracing::warn! (the next ingest re-enters the chain on its own).
composio/bus.rs — same logging-only deferral handling (composio bus owns no scheduler).

Tests (acceptance criteria #4)

All five required scenarios are covered as mock_agent_run_turn integration tests in evaluator_tests.rs:

Test	Path
`happy_path_returns_cloud_resolution`	Cloud
`rate_limited_then_ok_marks_cloud_after_retry`	429 → ok → CloudAfterRetry
`double_429_falls_through_to_local_fallback`	429 → 429 → LocalFallback
`cloud_5xx_falls_through_to_local_fallback`	5xx → 5xx → LocalFallback
`cloud_then_local_failure_returns_deferred`	5xx → 5xx → 503 (local) → Deferred
`fatal_cloud_error_short_circuits_without_local_attempt`	401 → Err (no retry, no fallback)
`no_local_arm_returns_deferred_after_cloud_exhaustion`	cloud exhausted, `local: None` → Deferred

Plus four pure classifier tests covering classify_error (429 + Retry-After parsing, 5xx, timeout, auth-fatal).

cargo test --lib openhuman::agent::triage::
  46 passed; 0 failed
cargo test --lib openhuman::agent::
  504 passed; 0 failed

PR Submission Checklist

N/A: no UI changes.
RPC surface: agent.run_trigger and webhooks.trigger_agent responses now include resolution_path; new "deferred" decision shape with defer_until_ms + reason. Additive — existing keys unchanged on the Decision path.
Coverage: 11 new evaluator tests cover all five tiered-fallback paths plus error-classification edge cases, and the new TriageOutcome::Deferred arm is exercised by both cloud_then_local_failure_returns_deferred and no_local_arm_returns_deferred_after_cloud_exhaustion.
Docs: module-level docstring on evaluator.rs documents the chain; rustdoc on each new public type.

Acceptance criteria

Tiered fallback — cloud → retry → local → defer chain implemented in triage/evaluator.rs.
Cloud retry budget — at most 2 attempts (initial + 1 retry) before fallthrough; Retry-After honored (capped at 30s).
Resolution path is observable — resolution_path carries cloud / cloud-after-retry / local-fallback; "deferred" shape is the deferred state.
Tests — happy cloud, 429 → retry → ok, 429 → retry → 429 → local fallback, cloud 5xx → local fallback, local fail → defer.
Diff coverage ≥ 80% — new code is dominated by typed enums + unit-tested helpers; integration tests drive the orchestrator end-to-end through the bus.

Issue: Triage tiered fallback: cloud → wait-and-retry → fallback-local → defer #1257
Builds on Add throttling and power-aware execution for local LLM inference #1073 (LlmPermit gating for the local arm — already wired).
JobOutcome::Defer (feat(memory_tree/jobs): JobOutcome::Defer + mark_deferred (#1256) #1345) — landed; the TriageOutcome::Deferred shape mirrors that contract for the triage caller surface.
The cloud rate-limiter cooperation (separate issue) is not strictly required here: this PR parses Retry-After directly from the upstream error string, the same shape ReliableProvider::compute_backoff already handles.

Summary by CodeRabbit

New Features
- Triage decisions can now be deferred until a later time, with automatic retry scheduling.
- Implemented automatic fallback to local inference when cloud triage services are unavailable.
- Added intelligent retry logic for transient failures during triage evaluation.
Bug Fixes
- Improved error handling and resilience in triage decision-making pipeline.

Cloud failures (429, 5xx, timeout) now retry once honouring Retry-After before falling through to a local arm; if both fail the run yields a TriageOutcome::Deferred so the caller can reschedule instead of hard- failing user-visible triggers. Fixes tinyhumansai#1257

coderabbitai · 2026-05-08T08:01:43Z

📝 Walkthrough

Walkthrough

This PR implements a tiered fallback chain for triage evaluation as specified in issue #1257. Cloud is attempted first; on transient failures (429, 5xx, timeout), the system retries once with backoff. If cloud exhausts its attempts, local fallback is tried. If both fail, evaluation is deferred with a computed wake-up timestamp. All triage callers now explicitly handle TriageOutcome::Decision and TriageOutcome::Deferred variants, including deferral logging without applying side effects.

Changes

Triage Tiered Fallback with Deferred Outcomes

Layer / File(s)	Summary
Type Contracts & Outcomes `src/openhuman/agent/triage/evaluator.rs`	Introduces `TriageResolutionPath` enum (Cloud, CloudAfterRetry, LocalFallback) and `TriageOutcome` type supporting `Decision(TriageRun)` or `Deferred { defer_until_ms, reason }`. Updates `TriageRun` to carry `resolution_path` metadata.
Module Documentation `src/openhuman/agent/triage/evaluator.rs`	Updates module-level documentation to describe the tiered fallback chain: cloud first, retry on retryable failures, fallback to local, defer on exhaustion.
Provider Resolution & Local Fallback `src/openhuman/agent/triage/routing.rs`, `src/openhuman/agent/triage/mod.rs`	Adds `build_local_provider_with_config` helper to construct optional local fallback providers gated by runtime enablement and chat model availability. Refactors `resolve_provider_with_config` to delegate to explicit remote builder. Expands module re-exports.
Core Evaluator Orchestration `src/openhuman/agent/triage/evaluator.rs`	Rewrites `run_triage` to orchestrate cloud+local provider resolution and delegates to `run_triage_with_arms`. Implements tiered fallback: cloud attempt → retry on retryable errors with Retry-After capping → fallback to local → defer on exhaustion.
Error Classification & Arm Logic `src/openhuman/agent/triage/evaluator.rs`	Introduces `ArmError` type distinguishing Retryable vs Fatal failures. Implements `classify_error`, `is_transient_string`, `parse_retry_after_ms`, and `try_arm` to handle provider-specific patterns, arm-dependent parse failure retryability, and deferred wake-up timestamps.
Provider-Specific Error Helpers `src/openhuman/providers/reliable.rs`	Exposes `is_upstream_unhealthy` as `pub(crate)` helper for detecting temporary capacity/outage conditions. Maintains `is_rate_limited` detection logic for error classification.
Controller Response Handling `src/openhuman/agent/schemas.rs`	Updates `triage_evaluate` to match on `TriageOutcome` variants. For `Decision`, conditionally applies `apply_decision` only when not dry-run and returns JSON with `resolution_path` and latency. For `Deferred`, returns JSON with `defer_until_ms` and `reason` without side effects.
Triage Caller Integration `src/openhuman/webhooks/bus.rs`, `src/openhuman/webhooks/ops.rs`, `src/openhuman/composio/bus.rs`, `src/openhuman/notifications/rpc.rs`	Updates all triage callers to explicitly match on `TriageOutcome::Decision` vs `Deferred`. Decision paths apply decisions and return summaries; Deferred paths log warnings and skip scheduling/persistence without errors.
Comprehensive Test Suite `src/openhuman/agent/triage/evaluator_tests.rs`	Adds unit tests for error classification (HTTP 429 with Retry-After, 5xx, timeout, auth/401). Introduces comprehensive integration tests: happy cloud path, 429→retry→ok, double 429→local fallback, 5xx→local fallback, all fail→deferred, fatal auth short-circuit, and no-local-arm deferral.
Test Fixture Updates `src/openhuman/agent/triage/escalation.rs`	Updates test helpers `run()` and `run_with_target()` to set `TriageRun.resolution_path` to `TriageResolutionPath::Cloud`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

tinyhumansai/openhuman#1252: Local LLM throttling that predates this feature and lists deferral as a deferred follow-up.
tinyhumansai/openhuman#1334: Modifies ComposioTriggerSubscriber triage path in composio/bus.rs for deferred outcome handling.

Suggested reviewers

senamakel

Poem

🐰 The Rabbit's Fallback Ballad

Cloud tried first with eager might,
But when the servers groaned in fright,
We retry once, then fade away,
To local models' gentle sway—
And if all fails, we softly defer,
Let morning mend the tangled blur! 🌙

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(agent/triage): tiered cloud → retry → local → defer fallback' accurately summarizes the main change: implementing a tiered fallback chain for triage evaluation.
Linked Issues check	✅ Passed	All coding requirements from `#1257` are met: tiered fallback chain (cloud → retry → local → defer), cloud retry budget with Retry-After support, resolution path tracking, and comprehensive test coverage across all specified scenarios.
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing the tiered triage fallback chain. Updates to schemas.rs, escalation.rs, routing.rs, and caller sites are necessary to handle new TriageOutcome variants and resolution path tracking.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (7)

src/openhuman/webhooks/ops.rs (1)

183-183: ⚡ Quick win

Consider extracting hardcoded timeout constant.

The 60-second timeout for triage and apply_decision is hardcoded here and also appears in src/openhuman/webhooks/bus.rs (line 130). Extracting this to a named constant (e.g., TRIAGE_TIMEOUT_SECS) would improve maintainability and make the timeout budget more discoverable.

♻️ Optional refactor to extract timeout constant

At the top of the file:

+/// Timeout for triage evaluation and decision application (seconds).
+const TRIAGE_TIMEOUT_SECS: u64 = 60;
+
 use crate::api::config::effective_api_url;

Then use it:

     let outcome = tokio::time::timeout(
-        std::time::Duration::from_secs(60),
+        std::time::Duration::from_secs(TRIAGE_TIMEOUT_SECS),
         crate::openhuman::agent::triage::run_triage(&envelope),
     )
     .await
-    .map_err(|_| "triage timed out after 60s".to_string())?
+    .map_err(|_| format!("triage timed out after {}s", TRIAGE_TIMEOUT_SECS))?

Also applies to: 192-197

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/webhooks/ops.rs` at line 183, Replace the hardcoded
std::time::Duration::from_secs(60) used for the triage and apply_decision
timeouts with a named constant (e.g., TRIAGE_TIMEOUT_SECS) so the timeout is
discoverable and reusable; define a const TRIAGE_TIMEOUT_SECS: u64 = 60 at a
shared location (either at the top of src/openhuman/webhooks/ops.rs or in a
common module if you also update src/openhuman/webhooks/bus.rs) and then
construct durations with std::time::Duration::from_secs(TRIAGE_TIMEOUT_SECS)
wherever triage or apply_decision timeouts are set.

src/openhuman/agent/triage/evaluator.rs (4)

1-519: ⚖️ Poor tradeoff

Heads up: file is now ~519 lines, just over the ~500-line guideline.

Doc-comment, classification helpers (classify_error, is_transient_string), and the new fallback driver have pushed evaluator.rs past the soft cap. A natural split is classify.rs (the ArmError, classify_error, is_transient_string, now_ms helpers) leaving the driver, try_arm, and prompt rendering here. Not a blocker for this PR.

As per coding guidelines: "Keep source files to ≤ ~500 lines per file; split modules when growing larger."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/triage/evaluator.rs` around lines 1 - 519, The file
exceeds the ~500-line guideline; split classification helpers into a new module:
extract the ArmError enum and the functions classify_error, is_transient_string,
and now_ms into a new classify.rs, leaving try_arm, run_triage*, prompt
rendering and driver code in evaluator.rs; in evaluator.rs replace the moved
definitions with "use crate::openhuman::agent::triage::classify::{ArmError,
classify_error, is_transient_string, now_ms};" (or equivalent path), add a mod
classify; to keep tests/builds working update any imports/re-exports and the
module tree (lib.rs or mod file) so the new classify.rs is compiled. Ensure
visibility (pub as needed) matches original usage and run unit tests.

188-208: ⚡ Quick win

Cloud retry-exhaustion error is dropped before the no-local Deferred path.

When the second cloud attempt returns ArmError::Retryable, the source: anyhow::Error is discarded (the tracing::warn! at 190-193 doesn't even include it), and if local is None the Deferred at 204-207 reports only "cloud retry exhausted; local arm unavailable". That makes operator-visible deferrals on stuck cloud upstreams effectively opaque — a 5xx and a connection reset look identical in the wake-up reason. Capture the second-attempt error and thread it into both the warn log and the Deferred reason:

♻️ Proposed fix

             match try_arm(&cloud, envelope, TriageResolutionPath::CloudAfterRetry).await {
                 Ok(run) => return Ok(TriageOutcome::Decision(run)),
                 Err(ArmError::Fatal(err)) => return Err(err),
-                Err(ArmError::Retryable { .. }) => {
-                    // Exhausted cloud budget — fall through to local.
-                    tracing::warn!(
-                        "[triage::evaluator] cloud retry budget exhausted; \
-                         falling back to local arm"
-                    );
-                }
+                Err(ArmError::Retryable { source, .. }) => {
+                    // Exhausted cloud budget — fall through to local.
+                    tracing::warn!(
+                        error = %source,
+                        "[triage::evaluator] cloud retry budget exhausted; \
+                         falling back to local arm"
+                    );
+                    cloud_failure = Some(source);
+                }
             }
         }
     }
@@
-    let Some(local) = local else {
+    let Some(local) = local else {
         // No local arm available at all (runtime disabled, no model
         // configured) — the only honest outcome is a deferral so the
         // next tick retries the whole chain.
+        let cause = cloud_failure
+            .map(|e| e.to_string())
+            .unwrap_or_else(|| "unknown".to_string());
         return Ok(TriageOutcome::Deferred {
             defer_until_ms: now_ms().saturating_add(DEFER_WAKEUP_MS),
-            reason: "cloud retry exhausted; local arm unavailable".to_string(),
+            reason: format!(
+                "cloud retry exhausted ({cause}); local arm unavailable"
+            ),
         });
     };

(plus a let mut cloud_failure: Option<anyhow::Error> = None; near the top of run_triage_with_arms).

The existing test no_local_arm_returns_deferred_after_cloud_exhaustion asserts reason.contains("local arm unavailable"), which the suggestion preserves.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/triage/evaluator.rs` around lines 188 - 208, In
run_triage_with_arms, capture the Retryable error from the second cloud attempt
instead of dropping it: introduce a let mut cloud_failure: Option<anyhow::Error>
= None and when matching Err(ArmError::Retryable { source, .. }) assign
cloud_failure = Some(source); update the tracing::warn! call that currently logs
"[triage::evaluator] cloud retry budget exhausted; falling back to local arm" to
include the captured error (if any), and when returning TriageOutcome::Deferred
for the no-local path, append or interpolate the cloud_failure's error message
into the reason string (preserving the existing "local arm unavailable" text so
tests like no_local_arm_returns_deferred_after_cloud_exhaustion still pass).

214-231: ⚡ Quick win

cloud + local both failed reason only carries the local error.

The format!("cloud + local both failed: {err}") interpolates the captured local arm error, but the cloud error was already dropped at the cloud-exhaustion site above. The wording promises both, the value contains one. Combined with the fix suggested for lines 188-208, you can include both causes here:

♻️ Proposed fix

-        Err(ArmError::Fatal(err)) | Err(ArmError::Retryable { source: err, .. }) => {
-            // Local also failed — defer rather than surface a hard
-            // error. Today's "hard fail" is the wrong default for a
-            // transient blocker per `#1257`.
-            let reason = format!("cloud + local both failed: {err}");
+        Err(ArmError::Fatal(err)) | Err(ArmError::Retryable { source: err, .. }) => {
+            // Local also failed — defer rather than surface a hard
+            // error. Today's "hard fail" is the wrong default for a
+            // transient blocker per `#1257`.
+            let cloud_cause = cloud_failure
+                .map(|e| e.to_string())
+                .unwrap_or_else(|| "<not captured>".to_string());
+            let reason = format!("cloud failed ({cloud_cause}); local failed: {err}");

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/triage/evaluator.rs` around lines 214 - 231, The current
Deferred reason only includes the local arm error; capture and propagate the
cloud arm error from the cloud-exhaustion site so the LocalFallback match can
include both causes: adjust the cloud-exhaustion code (the earlier try_arm
call/handling around TriageResolutionPath::Cloud) to preserve the cloud error
into a named variable (e.g., cloud_err) and return or attach it inside the
ArmError/Retryable path so the later match on try_arm(&local, ...,
TriageResolutionPath::LocalFallback) can access both errors; then build the
reason as something like format!("cloud + local both failed: cloud: {cloud_err};
local: {err}") before emitting the tracing::warn and returning
TriageOutcome::Deferred (use the existing symbols try_arm, ArmError::{Fatal,
Retryable}, TriageOutcome::Deferred, now_ms(), and DEFER_WAKEUP_MS).

404-429: 💤 Low value

is_transient_string is sensible; one minor 5xx tokenization edge.

The digit-tokenization loop at 421-427 will also flip to true for any unrelated number that happens to land in [500, 600) — e.g., a port literal :5432 won't match (out of range) but :502 would, and request IDs like req-573-… would too. In practice agent-bus error strings are well-formed enough that this is unlikely to matter, but anchoring the digits to typical HTTP-status shapes (e.g., HTTP 5xx, status 5xx, 5xx) would make the heuristic less likely to drift over time. Optional.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/triage/evaluator.rs` around lines 404 - 429, The
digit-tokenization in is_transient_string is too broad and treats any numeric
token in 500..600 as a transient HTTP 5xx; narrow this by only accepting 5xx
matches that are anchored to HTTP/status-style contexts or whole-word HTTP
codes. In practice, update the digit-tokenization loop in is_transient_string
(the block that iterates over lower.split(...) and parses tokens) to instead
search for 5xx using a stricter rule — e.g., use a regex or explicit checks that
require a surrounding word boundary (\\b5\\d{2}\\b) or the presence of nearby
cues like "http", "status", "code", or "error" before/around the digits — so
random numeric IDs/ports (e.g., req-573 or :5432) no longer trigger true. Ensure
the function continues to return true for genuine 5xx indications and false
otherwise.

src/openhuman/agent/triage/routing.rs (2)

92-105: 💤 Low value

Local-arm label/provider_name is misleading for non-llamacpp + override combos.

When OPENHUMAN_LOCAL_INFERENCE_URL is set, use_openai_compat becomes true regardless of provider_kind, but the label collapses to "llamacpp" for everything that is not "custom_openai" — including "ollama", "llama-server", or unknown values. That label is then surfaced as provider_name in ResolvedProvider and in the tracing::debug! line below, so an Ollama-with-override deployment will show up in logs/telemetry as llamacpp, which makes the new resolution_path instrumentation harder to trust on degraded turns.

Consider preserving the configured kind (or at least distinguishing llama-server from llamacpp):
♻️ Proposed fix
-    let (label, base) = if use_openai_compat {
-        let base = override_base
-            .or_else(|| local_cfg.base_url.clone())
-            .unwrap_or_else(|| "http://127.0.0.1:8080/v1".to_string());
-        let label = if provider_kind == "custom_openai" {
-            "custom_openai"
-        } else {
-            "llamacpp"
-        };
-        (label, base)
-    } else {
+    let (label, base) = if use_openai_compat {
+        let base = override_base
+            .or_else(|| local_cfg.base_url.clone())
+            .unwrap_or_else(|| "http://127.0.0.1:8080/v1".to_string());
+        let label: &str = match provider_kind.as_str() {
+            "custom_openai" => "custom_openai",
+            "llama-server" => "llama-server",
+            "llamacpp" => "llamacpp",
+            _ => "openai_compat",
+        };
+        (label, base)
+    } else {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/triage/routing.rs` around lines 92 - 105, The current
branch that sets (label, base) when use_openai_compat is true collapses every
non-"custom_openai" provider_kind into "llamacpp", causing mislabeling in
ResolvedProvider/tracing; change the logic in the block that assigns label (the
let (label, base) = ... expression) so that label preserves the original
provider_kind (or maps only "custom_openai" → "custom_openai") instead of
defaulting to "llamacpp" — e.g., set label = provider_kind.clone() or match
provider_kind to explicitly preserve "ollama", "llama-server", "llamacpp", etc.;
keep the base selection behavior using override_base.or_else(...) as-is.
81-105: 💤 Low value

Document the OPENHUMAN_LOCAL_INFERENCE_URL contract to clarify whether the /v1 path component is required.

OpenAiCompatibleProvider::new only trims trailing slashes and does not append /v1 itself. The chat_completions_url() method (src/openhuman/providers/compatible.rs:246–264) appends /chat/completions only if the base URL doesn't already contain that endpoint, but it never normalizes or auto-appends /v1. This differs from the Ollama branch, which explicitly appends /v1, and the hard-coded default, which includes it. If a user sets OPENHUMAN_LOCAL_INFERENCE_URL=http://host:8080 without the /v1 prefix, the client will POST to http://host:8080/chat/completions instead of the expected /v1/chat/completions, causing triage failures. Either enforce the /v1 suffix in code or document that the env var must include it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/triage/routing.rs` around lines 81 - 105, The env var
OPENHUMAN_LOCAL_INFERENCE_URL is currently only trimmed of trailing slashes in
the routing logic (see override_base and use_openai_compat) but the code path
for OpenAiCompatibleProvider (OpenAiCompatibleProvider::new and
chat_completions_url) does not normalize or append a /v1 segment, causing
mismatched endpoints compared to the ollama branch; fix by normalizing
override_base to ensure it always ends with "/v1" (unless it already contains a
path like "/v1" or "/v1/") before it is used or passed into
OpenAiCompatibleProvider, so that local_cfg.base_url/or override_base yields a
base that will produce requests to /v1/chat/completions consistently across
use_openai_compat and ollama branches.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/agent/triage/evaluator.rs`:
- Around line 1-519: The file exceeds the ~500-line guideline; split
classification helpers into a new module: extract the ArmError enum and the
functions classify_error, is_transient_string, and now_ms into a new
classify.rs, leaving try_arm, run_triage*, prompt rendering and driver code in
evaluator.rs; in evaluator.rs replace the moved definitions with "use
crate::openhuman::agent::triage::classify::{ArmError, classify_error,
is_transient_string, now_ms};" (or equivalent path), add a mod classify; to keep
tests/builds working update any imports/re-exports and the module tree (lib.rs
or mod file) so the new classify.rs is compiled. Ensure visibility (pub as
needed) matches original usage and run unit tests.
- Around line 188-208: In run_triage_with_arms, capture the Retryable error from
the second cloud attempt instead of dropping it: introduce a let mut
cloud_failure: Option<anyhow::Error> = None and when matching
Err(ArmError::Retryable { source, .. }) assign cloud_failure = Some(source);
update the tracing::warn! call that currently logs "[triage::evaluator] cloud
retry budget exhausted; falling back to local arm" to include the captured error
(if any), and when returning TriageOutcome::Deferred for the no-local path,
append or interpolate the cloud_failure's error message into the reason string
(preserving the existing "local arm unavailable" text so tests like
no_local_arm_returns_deferred_after_cloud_exhaustion still pass).
- Around line 214-231: The current Deferred reason only includes the local arm
error; capture and propagate the cloud arm error from the cloud-exhaustion site
so the LocalFallback match can include both causes: adjust the cloud-exhaustion
code (the earlier try_arm call/handling around TriageResolutionPath::Cloud) to
preserve the cloud error into a named variable (e.g., cloud_err) and return or
attach it inside the ArmError/Retryable path so the later match on
try_arm(&local, ..., TriageResolutionPath::LocalFallback) can access both
errors; then build the reason as something like format!("cloud + local both
failed: cloud: {cloud_err}; local: {err}") before emitting the tracing::warn and
returning TriageOutcome::Deferred (use the existing symbols try_arm,
ArmError::{Fatal, Retryable}, TriageOutcome::Deferred, now_ms(), and
DEFER_WAKEUP_MS).
- Around line 404-429: The digit-tokenization in is_transient_string is too
broad and treats any numeric token in 500..600 as a transient HTTP 5xx; narrow
this by only accepting 5xx matches that are anchored to HTTP/status-style
contexts or whole-word HTTP codes. In practice, update the digit-tokenization
loop in is_transient_string (the block that iterates over lower.split(...) and
parses tokens) to instead search for 5xx using a stricter rule — e.g., use a
regex or explicit checks that require a surrounding word boundary
(\\b5\\d{2}\\b) or the presence of nearby cues like "http", "status", "code", or
"error" before/around the digits — so random numeric IDs/ports (e.g., req-573 or
:5432) no longer trigger true. Ensure the function continues to return true for
genuine 5xx indications and false otherwise.

In `@src/openhuman/agent/triage/routing.rs`:
- Around line 92-105: The current branch that sets (label, base) when
use_openai_compat is true collapses every non-"custom_openai" provider_kind into
"llamacpp", causing mislabeling in ResolvedProvider/tracing; change the logic in
the block that assigns label (the let (label, base) = ... expression) so that
label preserves the original provider_kind (or maps only "custom_openai" →
"custom_openai") instead of defaulting to "llamacpp" — e.g., set label =
provider_kind.clone() or match provider_kind to explicitly preserve "ollama",
"llama-server", "llamacpp", etc.; keep the base selection behavior using
override_base.or_else(...) as-is.
- Around line 81-105: The env var OPENHUMAN_LOCAL_INFERENCE_URL is currently
only trimmed of trailing slashes in the routing logic (see override_base and
use_openai_compat) but the code path for OpenAiCompatibleProvider
(OpenAiCompatibleProvider::new and chat_completions_url) does not normalize or
append a /v1 segment, causing mismatched endpoints compared to the ollama
branch; fix by normalizing override_base to ensure it always ends with "/v1"
(unless it already contains a path like "/v1" or "/v1/") before it is used or
passed into OpenAiCompatibleProvider, so that local_cfg.base_url/or
override_base yields a base that will produce requests to /v1/chat/completions
consistently across use_openai_compat and ollama branches.

In `@src/openhuman/webhooks/ops.rs`:
- Line 183: Replace the hardcoded std::time::Duration::from_secs(60) used for
the triage and apply_decision timeouts with a named constant (e.g.,
TRIAGE_TIMEOUT_SECS) so the timeout is discoverable and reusable; define a const
TRIAGE_TIMEOUT_SECS: u64 = 60 at a shared location (either at the top of
src/openhuman/webhooks/ops.rs or in a common module if you also update
src/openhuman/webhooks/bus.rs) and then construct durations with
std::time::Duration::from_secs(TRIAGE_TIMEOUT_SECS) wherever triage or
apply_decision timeouts are set.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7f73ecb6-a181-4d8b-9015-c5ad1a2373ac

📥 Commits

Reviewing files that changed from the base of the PR and between 2fe6f71 and 5554bf3.

📒 Files selected for processing (11)

src/openhuman/agent/schemas.rs
src/openhuman/agent/triage/escalation.rs
src/openhuman/agent/triage/evaluator.rs
src/openhuman/agent/triage/evaluator_tests.rs
src/openhuman/agent/triage/mod.rs
src/openhuman/agent/triage/routing.rs
src/openhuman/composio/bus.rs
src/openhuman/notifications/rpc.rs
src/openhuman/providers/reliable.rs
src/openhuman/webhooks/bus.rs
src/openhuman/webhooks/ops.rs

obchain requested a review from a team May 8, 2026 08:01

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

coderabbitai Bot approved these changes May 8, 2026

View reviewed changes

senamakel merged commit 3953a13 into tinyhumansai:main May 9, 2026
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent/triage): tiered cloud → retry → local → defer fallback#1367

fix(agent/triage): tiered cloud → retry → local → defer fallback#1367
senamakel merged 1 commit intotinyhumansai:mainfrom
obchain:fix/1257-triage-tiered-fallback

obchain commented May 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

obchain commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Core (src/openhuman/agent/triage/)

Caller updates (4 sites — all match TriageOutcome)

Tests (acceptance criteria #4)

PR Submission Checklist

Acceptance criteria

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

obchain commented May 8, 2026 •

edited by coderabbitai Bot

Loading

Core (`src/openhuman/agent/triage/`)

Caller updates (4 sites — all match `TriageOutcome`)

coderabbitai Bot commented May 8, 2026 •

edited

Loading