Skip to content

fix(agent/triage): tiered cloud → retry → local → defer fallback#1367

Merged
senamakel merged 1 commit intotinyhumansai:mainfrom
obchain:fix/1257-triage-tiered-fallback
May 9, 2026
Merged

fix(agent/triage): tiered cloud → retry → local → defer fallback#1367
senamakel merged 1 commit intotinyhumansai:mainfrom
obchain:fix/1257-triage-tiered-fallback

Conversation

@obchain
Copy link
Copy Markdown
Contributor

@obchain obchain commented May 8, 2026

Summary

Fixes #1257.

Replaces triage's single-attempt cloud-only routing with the tiered fallback chain the issue called for:

cloud (initial)
  ├── 429 / transient (5xx / timeout / connection) ──► retry once
  │       └── still failing ──► local fallback
  └── ok ──► resolution_path = Cloud | CloudAfterRetry

local fallback (acquires global LlmPermit per #1073)
  ├── ok ──► resolution_path = LocalFallback
  └── failed ──► TriageOutcome::Deferred { defer_until_ms, reason }

Non-transient cloud failures (auth, missing model, fatal parse error) bubble up immediately — the local arm wouldn't help.

Changes

Core (src/openhuman/agent/triage/)

  • evaluator.rs — rewritten around try_arm (single-arm attempt + error classification) and run_triage_with_arms (orchestrator). Adds:
    • pub enum TriageResolutionPath { Cloud, CloudAfterRetry, LocalFallback } with as_str().
    • pub struct TriageRun { …, resolution_path } (acceptance criteria Feat/landing revamp #3).
    • pub enum TriageOutcome { Decision(TriageRun), Deferred { defer_until_ms, reason } } — surfaces deferral to callers (acceptance criteria Feat/gitbooks #1).
    • classify_error + is_transient_string helpers that recognise 429, 5xx, timeouts, and connection errors. Reuses providers::reliable::{is_rate_limited, is_upstream_unhealthy, parse_retry_after_ms} (visibility bumped to pub(crate)).
    • 30s RETRY_AFTER_CAP, 500 ms TRANSIENT_BACKOFF, 30s DEFER_WAKEUP_MS constants.
  • routing.rs — adds build_local_provider_with_config(&Config) -> Option<ResolvedProvider> mirroring the wiring in routing::factory::new_provider's local arm (Ollama by default, OPENHUMAN_LOCAL_INFERENCE_URL overridable). Returns None when local AI is disabled or unconfigured, in which case the chain skips straight to Deferred.
  • mod.rs — re-exports TriageOutcome, TriageResolutionPath, build_local_provider_with_config.

Caller updates (4 sites — all match TriageOutcome)

  • agent/schemas.rsagent.run_trigger RPC: Deferred returns a JSON shape with decision: "deferred", defer_until_ms, reason; Decision is unchanged but adds resolution_path to the response payload.
  • webhooks/ops.rs (webhooks.trigger_agent) — same pattern, plus resolution_path on the success payload.
  • webhooks/bus.rs (run_agent_trigger summary) — formats Triage deferred until …: … for the bus reply on deferral.
  • notifications/rpc.rs (notification triage) — logs deferral with tracing::warn! (the next ingest re-enters the chain on its own).
  • composio/bus.rs — same logging-only deferral handling (composio bus owns no scheduler).

Tests (acceptance criteria #4)

All five required scenarios are covered as mock_agent_run_turn integration tests in evaluator_tests.rs:

Test Path
happy_path_returns_cloud_resolution Cloud
rate_limited_then_ok_marks_cloud_after_retry 429 → ok → CloudAfterRetry
double_429_falls_through_to_local_fallback 429 → 429 → LocalFallback
cloud_5xx_falls_through_to_local_fallback 5xx → 5xx → LocalFallback
cloud_then_local_failure_returns_deferred 5xx → 5xx → 503 (local) → Deferred
fatal_cloud_error_short_circuits_without_local_attempt 401 → Err (no retry, no fallback)
no_local_arm_returns_deferred_after_cloud_exhaustion cloud exhausted, local: None → Deferred

Plus four pure classifier tests covering classify_error (429 + Retry-After parsing, 5xx, timeout, auth-fatal).

cargo test --lib openhuman::agent::triage::
  46 passed; 0 failed
cargo test --lib openhuman::agent::
  504 passed; 0 failed

PR Submission Checklist

  • N/A: no UI changes.
  • RPC surface: agent.run_trigger and webhooks.trigger_agent responses now include resolution_path; new "deferred" decision shape with defer_until_ms + reason. Additive — existing keys unchanged on the Decision path.
  • Coverage: 11 new evaluator tests cover all five tiered-fallback paths plus error-classification edge cases, and the new TriageOutcome::Deferred arm is exercised by both cloud_then_local_failure_returns_deferred and no_local_arm_returns_deferred_after_cloud_exhaustion.
  • Docs: module-level docstring on evaluator.rs documents the chain; rustdoc on each new public type.

Acceptance criteria

  • Tiered fallback — cloud → retry → local → defer chain implemented in triage/evaluator.rs.
  • Cloud retry budget — at most 2 attempts (initial + 1 retry) before fallthrough; Retry-After honored (capped at 30s).
  • Resolution path is observableresolution_path carries cloud / cloud-after-retry / local-fallback; "deferred" shape is the deferred state.
  • Tests — happy cloud, 429 → retry → ok, 429 → retry → 429 → local fallback, cloud 5xx → local fallback, local fail → defer.
  • Diff coverage ≥ 80% — new code is dominated by typed enums + unit-tested helpers; integration tests drive the orchestrator end-to-end through the bus.

Related

Summary by CodeRabbit

  • New Features

    • Triage decisions can now be deferred until a later time, with automatic retry scheduling.
    • Implemented automatic fallback to local inference when cloud triage services are unavailable.
    • Added intelligent retry logic for transient failures during triage evaluation.
  • Bug Fixes

    • Improved error handling and resilience in triage decision-making pipeline.

Cloud failures (429, 5xx, timeout) now retry once honouring Retry-After
before falling through to a local arm; if both fail the run yields a
TriageOutcome::Deferred so the caller can reschedule instead of hard-
failing user-visible triggers.

Fixes tinyhumansai#1257
@obchain obchain requested a review from a team May 8, 2026 08:01
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements a tiered fallback chain for triage evaluation as specified in issue #1257. Cloud is attempted first; on transient failures (429, 5xx, timeout), the system retries once with backoff. If cloud exhausts its attempts, local fallback is tried. If both fail, evaluation is deferred with a computed wake-up timestamp. All triage callers now explicitly handle TriageOutcome::Decision and TriageOutcome::Deferred variants, including deferral logging without applying side effects.

Changes

Triage Tiered Fallback with Deferred Outcomes

Layer / File(s) Summary
Type Contracts & Outcomes
src/openhuman/agent/triage/evaluator.rs
Introduces TriageResolutionPath enum (Cloud, CloudAfterRetry, LocalFallback) and TriageOutcome type supporting Decision(TriageRun) or Deferred { defer_until_ms, reason }. Updates TriageRun to carry resolution_path metadata.
Module Documentation
src/openhuman/agent/triage/evaluator.rs
Updates module-level documentation to describe the tiered fallback chain: cloud first, retry on retryable failures, fallback to local, defer on exhaustion.
Provider Resolution & Local Fallback
src/openhuman/agent/triage/routing.rs, src/openhuman/agent/triage/mod.rs
Adds build_local_provider_with_config helper to construct optional local fallback providers gated by runtime enablement and chat model availability. Refactors resolve_provider_with_config to delegate to explicit remote builder. Expands module re-exports.
Core Evaluator Orchestration
src/openhuman/agent/triage/evaluator.rs
Rewrites run_triage to orchestrate cloud+local provider resolution and delegates to run_triage_with_arms. Implements tiered fallback: cloud attempt → retry on retryable errors with Retry-After capping → fallback to local → defer on exhaustion.
Error Classification & Arm Logic
src/openhuman/agent/triage/evaluator.rs
Introduces ArmError type distinguishing Retryable vs Fatal failures. Implements classify_error, is_transient_string, parse_retry_after_ms, and try_arm to handle provider-specific patterns, arm-dependent parse failure retryability, and deferred wake-up timestamps.
Provider-Specific Error Helpers
src/openhuman/providers/reliable.rs
Exposes is_upstream_unhealthy as pub(crate) helper for detecting temporary capacity/outage conditions. Maintains is_rate_limited detection logic for error classification.
Controller Response Handling
src/openhuman/agent/schemas.rs
Updates triage_evaluate to match on TriageOutcome variants. For Decision, conditionally applies apply_decision only when not dry-run and returns JSON with resolution_path and latency. For Deferred, returns JSON with defer_until_ms and reason without side effects.
Triage Caller Integration
src/openhuman/webhooks/bus.rs, src/openhuman/webhooks/ops.rs, src/openhuman/composio/bus.rs, src/openhuman/notifications/rpc.rs
Updates all triage callers to explicitly match on TriageOutcome::Decision vs Deferred. Decision paths apply decisions and return summaries; Deferred paths log warnings and skip scheduling/persistence without errors.
Comprehensive Test Suite
src/openhuman/agent/triage/evaluator_tests.rs
Adds unit tests for error classification (HTTP 429 with Retry-After, 5xx, timeout, auth/401). Introduces comprehensive integration tests: happy cloud path, 429→retry→ok, double 429→local fallback, 5xx→local fallback, all fail→deferred, fatal auth short-circuit, and no-local-arm deferral.
Test Fixture Updates
src/openhuman/agent/triage/escalation.rs
Updates test helpers run() and run_with_target() to set TriageRun.resolution_path to TriageResolutionPath::Cloud.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • senamakel

Poem

🐰 The Rabbit's Fallback Ballad

Cloud tried first with eager might,
But when the servers groaned in fright,
We retry once, then fade away,
To local models' gentle sway—
And if all fails, we softly defer,
Let morning mend the tangled blur! 🌙

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(agent/triage): tiered cloud → retry → local → defer fallback' accurately summarizes the main change: implementing a tiered fallback chain for triage evaluation.
Linked Issues check ✅ Passed All coding requirements from #1257 are met: tiered fallback chain (cloud → retry → local → defer), cloud retry budget with Retry-After support, resolution path tracking, and comprehensive test coverage across all specified scenarios.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the tiered triage fallback chain. Updates to schemas.rs, escalation.rs, routing.rs, and caller sites are necessary to handle new TriageOutcome variants and resolution path tracking.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (7)
src/openhuman/webhooks/ops.rs (1)

183-183: ⚡ Quick win

Consider extracting hardcoded timeout constant.

The 60-second timeout for triage and apply_decision is hardcoded here and also appears in src/openhuman/webhooks/bus.rs (line 130). Extracting this to a named constant (e.g., TRIAGE_TIMEOUT_SECS) would improve maintainability and make the timeout budget more discoverable.

♻️ Optional refactor to extract timeout constant

At the top of the file:

+/// Timeout for triage evaluation and decision application (seconds).
+const TRIAGE_TIMEOUT_SECS: u64 = 60;
+
 use crate::api::config::effective_api_url;

Then use it:

     let outcome = tokio::time::timeout(
-        std::time::Duration::from_secs(60),
+        std::time::Duration::from_secs(TRIAGE_TIMEOUT_SECS),
         crate::openhuman::agent::triage::run_triage(&envelope),
     )
     .await
-    .map_err(|_| "triage timed out after 60s".to_string())?
+    .map_err(|_| format!("triage timed out after {}s", TRIAGE_TIMEOUT_SECS))?

Also applies to: 192-197

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/webhooks/ops.rs` at line 183, Replace the hardcoded
std::time::Duration::from_secs(60) used for the triage and apply_decision
timeouts with a named constant (e.g., TRIAGE_TIMEOUT_SECS) so the timeout is
discoverable and reusable; define a const TRIAGE_TIMEOUT_SECS: u64 = 60 at a
shared location (either at the top of src/openhuman/webhooks/ops.rs or in a
common module if you also update src/openhuman/webhooks/bus.rs) and then
construct durations with std::time::Duration::from_secs(TRIAGE_TIMEOUT_SECS)
wherever triage or apply_decision timeouts are set.
src/openhuman/agent/triage/evaluator.rs (4)

1-519: ⚖️ Poor tradeoff

Heads up: file is now ~519 lines, just over the ~500-line guideline.

Doc-comment, classification helpers (classify_error, is_transient_string), and the new fallback driver have pushed evaluator.rs past the soft cap. A natural split is classify.rs (the ArmError, classify_error, is_transient_string, now_ms helpers) leaving the driver, try_arm, and prompt rendering here. Not a blocker for this PR.

As per coding guidelines: "Keep source files to ≤ ~500 lines per file; split modules when growing larger."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/triage/evaluator.rs` around lines 1 - 519, The file
exceeds the ~500-line guideline; split classification helpers into a new module:
extract the ArmError enum and the functions classify_error, is_transient_string,
and now_ms into a new classify.rs, leaving try_arm, run_triage*, prompt
rendering and driver code in evaluator.rs; in evaluator.rs replace the moved
definitions with "use crate::openhuman::agent::triage::classify::{ArmError,
classify_error, is_transient_string, now_ms};" (or equivalent path), add a mod
classify; to keep tests/builds working update any imports/re-exports and the
module tree (lib.rs or mod file) so the new classify.rs is compiled. Ensure
visibility (pub as needed) matches original usage and run unit tests.

188-208: ⚡ Quick win

Cloud retry-exhaustion error is dropped before the no-local Deferred path.

When the second cloud attempt returns ArmError::Retryable, the source: anyhow::Error is discarded (the tracing::warn! at 190-193 doesn't even include it), and if local is None the Deferred at 204-207 reports only "cloud retry exhausted; local arm unavailable". That makes operator-visible deferrals on stuck cloud upstreams effectively opaque — a 5xx and a connection reset look identical in the wake-up reason. Capture the second-attempt error and thread it into both the warn log and the Deferred reason:

♻️ Proposed fix
             match try_arm(&cloud, envelope, TriageResolutionPath::CloudAfterRetry).await {
                 Ok(run) => return Ok(TriageOutcome::Decision(run)),
                 Err(ArmError::Fatal(err)) => return Err(err),
-                Err(ArmError::Retryable { .. }) => {
-                    // Exhausted cloud budget — fall through to local.
-                    tracing::warn!(
-                        "[triage::evaluator] cloud retry budget exhausted; \
-                         falling back to local arm"
-                    );
-                }
+                Err(ArmError::Retryable { source, .. }) => {
+                    // Exhausted cloud budget — fall through to local.
+                    tracing::warn!(
+                        error = %source,
+                        "[triage::evaluator] cloud retry budget exhausted; \
+                         falling back to local arm"
+                    );
+                    cloud_failure = Some(source);
+                }
             }
         }
     }
@@
-    let Some(local) = local else {
+    let Some(local) = local else {
         // No local arm available at all (runtime disabled, no model
         // configured) — the only honest outcome is a deferral so the
         // next tick retries the whole chain.
+        let cause = cloud_failure
+            .map(|e| e.to_string())
+            .unwrap_or_else(|| "unknown".to_string());
         return Ok(TriageOutcome::Deferred {
             defer_until_ms: now_ms().saturating_add(DEFER_WAKEUP_MS),
-            reason: "cloud retry exhausted; local arm unavailable".to_string(),
+            reason: format!(
+                "cloud retry exhausted ({cause}); local arm unavailable"
+            ),
         });
     };

(plus a let mut cloud_failure: Option<anyhow::Error> = None; near the top of run_triage_with_arms).

The existing test no_local_arm_returns_deferred_after_cloud_exhaustion asserts reason.contains("local arm unavailable"), which the suggestion preserves.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/triage/evaluator.rs` around lines 188 - 208, In
run_triage_with_arms, capture the Retryable error from the second cloud attempt
instead of dropping it: introduce a let mut cloud_failure: Option<anyhow::Error>
= None and when matching Err(ArmError::Retryable { source, .. }) assign
cloud_failure = Some(source); update the tracing::warn! call that currently logs
"[triage::evaluator] cloud retry budget exhausted; falling back to local arm" to
include the captured error (if any), and when returning TriageOutcome::Deferred
for the no-local path, append or interpolate the cloud_failure's error message
into the reason string (preserving the existing "local arm unavailable" text so
tests like no_local_arm_returns_deferred_after_cloud_exhaustion still pass).

214-231: ⚡ Quick win

cloud + local both failed reason only carries the local error.

The format!("cloud + local both failed: {err}") interpolates the captured local arm error, but the cloud error was already dropped at the cloud-exhaustion site above. The wording promises both, the value contains one. Combined with the fix suggested for lines 188-208, you can include both causes here:

♻️ Proposed fix
-        Err(ArmError::Fatal(err)) | Err(ArmError::Retryable { source: err, .. }) => {
-            // Local also failed — defer rather than surface a hard
-            // error. Today's "hard fail" is the wrong default for a
-            // transient blocker per `#1257`.
-            let reason = format!("cloud + local both failed: {err}");
+        Err(ArmError::Fatal(err)) | Err(ArmError::Retryable { source: err, .. }) => {
+            // Local also failed — defer rather than surface a hard
+            // error. Today's "hard fail" is the wrong default for a
+            // transient blocker per `#1257`.
+            let cloud_cause = cloud_failure
+                .map(|e| e.to_string())
+                .unwrap_or_else(|| "<not captured>".to_string());
+            let reason = format!("cloud failed ({cloud_cause}); local failed: {err}");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/triage/evaluator.rs` around lines 214 - 231, The current
Deferred reason only includes the local arm error; capture and propagate the
cloud arm error from the cloud-exhaustion site so the LocalFallback match can
include both causes: adjust the cloud-exhaustion code (the earlier try_arm
call/handling around TriageResolutionPath::Cloud) to preserve the cloud error
into a named variable (e.g., cloud_err) and return or attach it inside the
ArmError/Retryable path so the later match on try_arm(&local, ...,
TriageResolutionPath::LocalFallback) can access both errors; then build the
reason as something like format!("cloud + local both failed: cloud: {cloud_err};
local: {err}") before emitting the tracing::warn and returning
TriageOutcome::Deferred (use the existing symbols try_arm, ArmError::{Fatal,
Retryable}, TriageOutcome::Deferred, now_ms(), and DEFER_WAKEUP_MS).

404-429: 💤 Low value

is_transient_string is sensible; one minor 5xx tokenization edge.

The digit-tokenization loop at 421-427 will also flip to true for any unrelated number that happens to land in [500, 600) — e.g., a port literal :5432 won't match (out of range) but :502 would, and request IDs like req-573-… would too. In practice agent-bus error strings are well-formed enough that this is unlikely to matter, but anchoring the digits to typical HTTP-status shapes (e.g., HTTP 5xx, status 5xx, 5xx) would make the heuristic less likely to drift over time. Optional.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/triage/evaluator.rs` around lines 404 - 429, The
digit-tokenization in is_transient_string is too broad and treats any numeric
token in 500..600 as a transient HTTP 5xx; narrow this by only accepting 5xx
matches that are anchored to HTTP/status-style contexts or whole-word HTTP
codes. In practice, update the digit-tokenization loop in is_transient_string
(the block that iterates over lower.split(...) and parses tokens) to instead
search for 5xx using a stricter rule — e.g., use a regex or explicit checks that
require a surrounding word boundary (\\b5\\d{2}\\b) or the presence of nearby
cues like "http", "status", "code", or "error" before/around the digits — so
random numeric IDs/ports (e.g., req-573 or :5432) no longer trigger true. Ensure
the function continues to return true for genuine 5xx indications and false
otherwise.
src/openhuman/agent/triage/routing.rs (2)

92-105: 💤 Low value

Local-arm label/provider_name is misleading for non-llamacpp + override combos.

When OPENHUMAN_LOCAL_INFERENCE_URL is set, use_openai_compat becomes true regardless of provider_kind, but the label collapses to "llamacpp" for everything that is not "custom_openai" — including "ollama", "llama-server", or unknown values. That label is then surfaced as provider_name in ResolvedProvider and in the tracing::debug! line below, so an Ollama-with-override deployment will show up in logs/telemetry as llamacpp, which makes the new resolution_path instrumentation harder to trust on degraded turns.

Consider preserving the configured kind (or at least distinguishing llama-server from llamacpp):

♻️ Proposed fix
-    let (label, base) = if use_openai_compat {
-        let base = override_base
-            .or_else(|| local_cfg.base_url.clone())
-            .unwrap_or_else(|| "http://127.0.0.1:8080/v1".to_string());
-        let label = if provider_kind == "custom_openai" {
-            "custom_openai"
-        } else {
-            "llamacpp"
-        };
-        (label, base)
-    } else {
+    let (label, base) = if use_openai_compat {
+        let base = override_base
+            .or_else(|| local_cfg.base_url.clone())
+            .unwrap_or_else(|| "http://127.0.0.1:8080/v1".to_string());
+        let label: &str = match provider_kind.as_str() {
+            "custom_openai" => "custom_openai",
+            "llama-server" => "llama-server",
+            "llamacpp" => "llamacpp",
+            _ => "openai_compat",
+        };
+        (label, base)
+    } else {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/triage/routing.rs` around lines 92 - 105, The current
branch that sets (label, base) when use_openai_compat is true collapses every
non-"custom_openai" provider_kind into "llamacpp", causing mislabeling in
ResolvedProvider/tracing; change the logic in the block that assigns label (the
let (label, base) = ... expression) so that label preserves the original
provider_kind (or maps only "custom_openai" → "custom_openai") instead of
defaulting to "llamacpp" — e.g., set label = provider_kind.clone() or match
provider_kind to explicitly preserve "ollama", "llama-server", "llamacpp", etc.;
keep the base selection behavior using override_base.or_else(...) as-is.

81-105: 💤 Low value

Document the OPENHUMAN_LOCAL_INFERENCE_URL contract to clarify whether the /v1 path component is required.

OpenAiCompatibleProvider::new only trims trailing slashes and does not append /v1 itself. The chat_completions_url() method (src/openhuman/providers/compatible.rs:246–264) appends /chat/completions only if the base URL doesn't already contain that endpoint, but it never normalizes or auto-appends /v1. This differs from the Ollama branch, which explicitly appends /v1, and the hard-coded default, which includes it. If a user sets OPENHUMAN_LOCAL_INFERENCE_URL=http://host:8080 without the /v1 prefix, the client will POST to http://host:8080/chat/completions instead of the expected /v1/chat/completions, causing triage failures. Either enforce the /v1 suffix in code or document that the env var must include it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/triage/routing.rs` around lines 81 - 105, The env var
OPENHUMAN_LOCAL_INFERENCE_URL is currently only trimmed of trailing slashes in
the routing logic (see override_base and use_openai_compat) but the code path
for OpenAiCompatibleProvider (OpenAiCompatibleProvider::new and
chat_completions_url) does not normalize or append a /v1 segment, causing
mismatched endpoints compared to the ollama branch; fix by normalizing
override_base to ensure it always ends with "/v1" (unless it already contains a
path like "/v1" or "/v1/") before it is used or passed into
OpenAiCompatibleProvider, so that local_cfg.base_url/or override_base yields a
base that will produce requests to /v1/chat/completions consistently across
use_openai_compat and ollama branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/agent/triage/evaluator.rs`:
- Around line 1-519: The file exceeds the ~500-line guideline; split
classification helpers into a new module: extract the ArmError enum and the
functions classify_error, is_transient_string, and now_ms into a new
classify.rs, leaving try_arm, run_triage*, prompt rendering and driver code in
evaluator.rs; in evaluator.rs replace the moved definitions with "use
crate::openhuman::agent::triage::classify::{ArmError, classify_error,
is_transient_string, now_ms};" (or equivalent path), add a mod classify; to keep
tests/builds working update any imports/re-exports and the module tree (lib.rs
or mod file) so the new classify.rs is compiled. Ensure visibility (pub as
needed) matches original usage and run unit tests.
- Around line 188-208: In run_triage_with_arms, capture the Retryable error from
the second cloud attempt instead of dropping it: introduce a let mut
cloud_failure: Option<anyhow::Error> = None and when matching
Err(ArmError::Retryable { source, .. }) assign cloud_failure = Some(source);
update the tracing::warn! call that currently logs "[triage::evaluator] cloud
retry budget exhausted; falling back to local arm" to include the captured error
(if any), and when returning TriageOutcome::Deferred for the no-local path,
append or interpolate the cloud_failure's error message into the reason string
(preserving the existing "local arm unavailable" text so tests like
no_local_arm_returns_deferred_after_cloud_exhaustion still pass).
- Around line 214-231: The current Deferred reason only includes the local arm
error; capture and propagate the cloud arm error from the cloud-exhaustion site
so the LocalFallback match can include both causes: adjust the cloud-exhaustion
code (the earlier try_arm call/handling around TriageResolutionPath::Cloud) to
preserve the cloud error into a named variable (e.g., cloud_err) and return or
attach it inside the ArmError/Retryable path so the later match on
try_arm(&local, ..., TriageResolutionPath::LocalFallback) can access both
errors; then build the reason as something like format!("cloud + local both
failed: cloud: {cloud_err}; local: {err}") before emitting the tracing::warn and
returning TriageOutcome::Deferred (use the existing symbols try_arm,
ArmError::{Fatal, Retryable}, TriageOutcome::Deferred, now_ms(), and
DEFER_WAKEUP_MS).
- Around line 404-429: The digit-tokenization in is_transient_string is too
broad and treats any numeric token in 500..600 as a transient HTTP 5xx; narrow
this by only accepting 5xx matches that are anchored to HTTP/status-style
contexts or whole-word HTTP codes. In practice, update the digit-tokenization
loop in is_transient_string (the block that iterates over lower.split(...) and
parses tokens) to instead search for 5xx using a stricter rule — e.g., use a
regex or explicit checks that require a surrounding word boundary
(\\b5\\d{2}\\b) or the presence of nearby cues like "http", "status", "code", or
"error" before/around the digits — so random numeric IDs/ports (e.g., req-573 or
:5432) no longer trigger true. Ensure the function continues to return true for
genuine 5xx indications and false otherwise.

In `@src/openhuman/agent/triage/routing.rs`:
- Around line 92-105: The current branch that sets (label, base) when
use_openai_compat is true collapses every non-"custom_openai" provider_kind into
"llamacpp", causing mislabeling in ResolvedProvider/tracing; change the logic in
the block that assigns label (the let (label, base) = ... expression) so that
label preserves the original provider_kind (or maps only "custom_openai" →
"custom_openai") instead of defaulting to "llamacpp" — e.g., set label =
provider_kind.clone() or match provider_kind to explicitly preserve "ollama",
"llama-server", "llamacpp", etc.; keep the base selection behavior using
override_base.or_else(...) as-is.
- Around line 81-105: The env var OPENHUMAN_LOCAL_INFERENCE_URL is currently
only trimmed of trailing slashes in the routing logic (see override_base and
use_openai_compat) but the code path for OpenAiCompatibleProvider
(OpenAiCompatibleProvider::new and chat_completions_url) does not normalize or
append a /v1 segment, causing mismatched endpoints compared to the ollama
branch; fix by normalizing override_base to ensure it always ends with "/v1"
(unless it already contains a path like "/v1" or "/v1/") before it is used or
passed into OpenAiCompatibleProvider, so that local_cfg.base_url/or
override_base yields a base that will produce requests to /v1/chat/completions
consistently across use_openai_compat and ollama branches.

In `@src/openhuman/webhooks/ops.rs`:
- Line 183: Replace the hardcoded std::time::Duration::from_secs(60) used for
the triage and apply_decision timeouts with a named constant (e.g.,
TRIAGE_TIMEOUT_SECS) so the timeout is discoverable and reusable; define a const
TRIAGE_TIMEOUT_SECS: u64 = 60 at a shared location (either at the top of
src/openhuman/webhooks/ops.rs or in a common module if you also update
src/openhuman/webhooks/bus.rs) and then construct durations with
std::time::Duration::from_secs(TRIAGE_TIMEOUT_SECS) wherever triage or
apply_decision timeouts are set.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7f73ecb6-a181-4d8b-9015-c5ad1a2373ac

📥 Commits

Reviewing files that changed from the base of the PR and between 2fe6f71 and 5554bf3.

📒 Files selected for processing (11)
  • src/openhuman/agent/schemas.rs
  • src/openhuman/agent/triage/escalation.rs
  • src/openhuman/agent/triage/evaluator.rs
  • src/openhuman/agent/triage/evaluator_tests.rs
  • src/openhuman/agent/triage/mod.rs
  • src/openhuman/agent/triage/routing.rs
  • src/openhuman/composio/bus.rs
  • src/openhuman/notifications/rpc.rs
  • src/openhuman/providers/reliable.rs
  • src/openhuman/webhooks/bus.rs
  • src/openhuman/webhooks/ops.rs

@senamakel senamakel merged commit 3953a13 into tinyhumansai:main May 9, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Triage tiered fallback: cloud → wait-and-retry → fallback-local → defer

2 participants