perf(agent): orchestrator harness efficiency improvements#1314
perf(agent): orchestrator harness efficiency improvements#1314senamakel merged 10 commits intotinyhumansai:mainfrom
Conversation
…ai#1141) - Re-enable payload summarizer at 4,000-token threshold (was disabled at 0 after recursive-dispatch was observed; root cause was the missing `omit_skills_catalog = true` guard on the summarizer archetype) - Skip `render_tools()` catalogue for native function-calling sessions (provider already sends JSON schemas in the API payload — duplicate in the system prompt was pure token bloat) - Consolidate 6 separate `memory_tree_*` tools into one `memory_tree` tool dispatched by `mode` enum, shrinking the orchestrator's tool surface by 5 entries per API call - Remove `schedule` and `spawn_subagent` from orchestrator tool list; `cron_add/cron_list/cron_remove` are the canonical scheduling surface and `delegate_*` + `spawn_worker_thread` replace `spawn_subagent` - Remove redundant "Available Sub-Agents" and "Direct Tools" tables from orchestrator prompt (LLM already sees these as native API tool schemas) - Compact `render_delegation_guide` to a single line per toolkit instead of a verbose multi-line spawn_subagent snippet - Add `max_result_chars` to `AgentDefinition`; set 8k for planner and researcher, 16k for code_executor to prevent context flooding - Wrap `ParentExecutionContext::memory_context` in `Arc<Option<String>>` so sub-agent spawns share the same allocation instead of cloning - Add `docs/DELEGATION_POLICY.md` codifying the 4-tier direct-first decision tree - Unit tests for all new behavior (ToolsSection native skip, memory_tree dispatcher modes, max_result_chars truncation, delegation guide compact format) Closes tinyhumansai#1141
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughThis PR adds per-agent output caps and UTF‑8-safe truncation, consolidates six memory-tree tools into a single mode-dispatching MemoryTreeTool, converts ParentExecutionContext.memory_context to Arc-wrapped ownership, narrows orchestrator delegation/tool surfaces and prompt text, introduces per-turn agent scoping in channel dispatch, re-enables payload-summarizer threshold, and updates tests/docs. ChangesDelegation Policy & Orchestrator Tooling
Sub-agent Output Control
Memory Tree Tool Consolidation
Sequence Diagram(s)sequenceDiagram
participant User as User/Channel
participant Dispatch as Channel Dispatch
participant DefReg as AgentDefinition Registry
participant Orch as Orchestrator Agent
participant SubAgent as Sub-agent Runner
participant Summarizer as Payload Summarizer
User->>Dispatch: AgentTurnRequest
Dispatch->>DefReg: resolve_target_agent()
DefReg-->>Dispatch: agent definition + connected integrations
Dispatch->>Orch: AgentTurnRequest + target_agent_id + visible_tool_names + extra_tools
Orch->>Orch: select delegation target (direct tool, sub-agent, or worker)
alt Direct Tool Call
Orch->>Orch: execute tool directly
else Delegate to Sub-agent
Orch->>SubAgent: run_subagent(agent_id, prompt, parent_context)
SubAgent->>DefReg: fetch sub-agent definition (incl. max_result_chars)
DefReg-->>SubAgent: definition + output cap
SubAgent->>SubAgent: execute sub-agent logic
SubAgent->>SubAgent: truncate output if exceeds max_result_chars
SubAgent->>Summarizer: maybe_summarize(output) if > threshold tokens
Summarizer-->>SubAgent: summarized or original
SubAgent-->>Orch: SubagentRunOutcome (possibly truncated/summarized)
end
Orch-->>User: final response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/openhuman/agent/agents/orchestrator/prompt.md (1)
55-63:⚠️ Potential issue | 🟠 Major | ⚡ Quick winWorker-thread instructions still reference removed
spawn_subagentflow.This block tells the orchestrator to use
spawn_subagent(... dedicated_thread: true), but the orchestrator tool surface now usesspawn_worker_thread. Keeping this text will produce invalid tool plans.Suggested fix
-`spawn_subagent` accepts an optional `dedicated_thread: true` flag. When set, the -sub-agent's run is persisted into a fresh **worker**-labeled thread the user can -open from the thread list, and you receive a compact reference (worker thread id -+ brief summary) instead of the full sub-agent transcript. Use this **only** -when the sub-task is genuinely long or complex and the parent thread should not -be flooded with the sub-agent's output — for example multi-step research, -multi-file refactors, or batch integration work that produces a large -transcript. For everyday delegation keep `dedicated_thread` off (the default) -and surface the result inline. +Use `spawn_worker_thread` for genuinely long/complex delegated tasks where full +sub-agent transcripts would flood the parent thread. It creates a persisted +**worker** thread and returns a compact `[worker_thread_ref]` to the parent. +For routine delegation, use regular `delegate_*` tools and keep results inline.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/agents/orchestrator/prompt.md` around lines 55 - 63, Update the documentation text to stop referencing the removed spawn_subagent API and instead instruct using the current spawn_worker_thread tool and its equivalent option (e.g., dedicated_thread flag or parameter name used by spawn_worker_thread); specifically replace mentions of spawn_subagent and describe the behavior of spawn_worker_thread when invoked with the dedicated_thread:true behavior (persisting the sub-agent run into a fresh worker-labeled thread and returning a compact worker thread id + summary), and ensure examples and guidance mention using spawn_worker_thread for long-running multi-step tasks while keeping the default inline delegation for everyday delegations.src/openhuman/tools/orchestrator_tools.rs (1)
126-159:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
delegate_*tools are generated even for unconnected integrations.The wildcard expansion ignores
integration.connected. If the input slice includes unconnected entries, the tool schema exposes delegates that will fail preflight and drift from prompt guidance.Suggested fix
for integration in connected_integrations { + if !integration.connected { + continue; + } // Slug the toolkit name into a tool-name-safe form.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/tools/orchestrator_tools.rs` around lines 126 - 159, The loop that creates delegate_* tools (building tool_name via sanitise_slug and pushing SkillDelegationTool instances) currently includes integrations regardless of their connection status; modify the logic so only actually connected integrations produce delegation tools—either filter connected_integrations beforehand or early-return/continue in the loop when integration.connected is false, ensuring SkillDelegationTool entries are created only for integrations where integration.connected == true so preflight and prompt guidance remain accurate.src/openhuman/agent/agents/orchestrator/prompt.rs (1)
38-42: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd debug/trace logs around delegation-guide rendering branches.
The changed prompt assembly path now conditionally injects/removes integration guidance but has no diagnostics. Please add stable-prefix debug logs for branch decisions (connected count, section emitted/omitted) to align with the Rust observability guideline.
As per coding guidelines: "Use
log/tracingatdebugortracelevel on RPC entry/exit, error paths, state transitions, and any branch that is hard to infer from tests; include stable prefixes."Also applies to: 71-89
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/agents/orchestrator/prompt.rs` around lines 38 - 42, Add stable-prefix debug/trace logging around the delegation-guide rendering branches: when calling render_delegation_guide(ctx.connected_integrations) and before/after the trim/append branch, emit a trace/debug log that includes a stable prefix (e.g. "delegation-guide:") plus ctx.connected_integrations count and whether the section will be emitted or omitted; do the same for the other similar block (the branch covering the section at lines referenced in the review) so there is a clear log on entry, decision (emitted/omitted) and resulting string length/state. Reference the render_delegation_guide function and ctx.connected_integrations in the log messages and use the project logging crate (log/tracing) at debug or trace level per guidelines.src/openhuman/channels/runtime/dispatch.rs (1)
441-449:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMissing timeout on
fetch_connected_integrationsmay block turn dispatch.
build_connection_state_block(lines 232-245) wrapsfetch_connected_integrationsin a 3-second timeout to prevent Composio API latency from blocking the turn. However, this call at line 442 has no such protection. If the Composio API is slow or unresponsive, this could delay agent routing indefinitely.Proposed fix: wrap in timeout with fallback
let extra_tools = if !definition.subagents.is_empty() { - let connected = fetch_connected_integrations(&config).await; + const COMPOSIO_FETCH_TIMEOUT_SECS: u64 = 3; + let connected = match tokio::time::timeout( + Duration::from_secs(COMPOSIO_FETCH_TIMEOUT_SECS), + fetch_connected_integrations(&config), + ) + .await + { + Ok(list) => list, + Err(_) => { + tracing::warn!( + channel = %channel, + target_agent = target_id, + "[dispatch::routing] Composio fetch timed out — proceeding without connected integrations" + ); + Vec::new() + } + }; tracing::debug!( channel = %channel, target_agent = target_id, connected_integration_count = connected.len(), "[dispatch::routing] fetched connected integrations for delegation expansion" );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/channels/runtime/dispatch.rs` around lines 441 - 449, fetch_connected_integrations at dispatch.rs (used when computing extra_tools) currently awaits without a timeout and can block routing; wrap the call to fetch_connected_integrations(&config).await in a tokio::time::timeout (3s, same as build_connection_state_block) and if it times out or errors, log a debug/warn including channel and target_agent (same fields used in the tracing::debug) and fall back to an empty Vec (or the previous safe default) before calling orchestrator_tools::collect_orchestrator_tools(definition, registry, &connected); ensure you handle both Ok(result) and Err(timeout) paths so extra_tools is always computed promptly.
🧹 Nitpick comments (2)
src/openhuman/agent/harness/subagent_runner/ops_tests.rs (1)
641-668: ⚡ Quick win
max_result_charstests don’t exercise the runner path.These tests currently reimplement truncation locally, so they can pass even if
run_subagenttruncation regresses. Please drive the assertion throughrun_subagentwithdef.max_result_chars = Some(...)/Noneto validate the real behavior end-to-end in this harness.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/harness/subagent_runner/ops_tests.rs` around lines 641 - 668, Replace the local truncation checks in the tests with end-to-end calls to run_subagent and set the subagent definition's max_result_chars field (e.g., def.max_result_chars = Some(10) for the truncation test and def.max_result_chars = None for the non‑truncation test), then assert on the actual output returned by run_subagent (or the runner result structure) instead of duplicating the truncate logic; update the tests max_result_chars_cap_is_enforced and max_result_chars_not_applied_when_none to build the minimal SubagentDef/inputs needed, call run_subagent, and validate that the returned output starts/ends or equals the expected values.src/openhuman/channels/runtime/dispatch.rs (1)
506-519: 💤 Low valueConsider extracting tests to a sibling
dispatch_tests.rsmodule.The file is ~1,425 lines, exceeding the ~500 line guideline. The
scoping_testsandtestsmodules together account for ~400 lines. Moving them to a dedicateddispatch_tests.rs(ortests/dispatch_tests.rs) would keep the main module focused on operational code per the coding guidelines.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/channels/runtime/dispatch.rs` around lines 506 - 519, The file-level tests (the #[cfg(test)] mod scoping_tests and the large tests module) should be extracted into a sibling test module file named dispatch_tests.rs to keep the main module under ~500 lines: create dispatch_tests.rs containing the scoping_tests and tests modules, move any helper imports from the parent (adjust use paths from super:: or crate:: as needed), ensure the moved tests reference symbols like scoping_tests, runtime_dispatch::dispatch_routes_through_agent_run_turn_bus_handler, and any private items by either making those items pub(crate) or re-exporting needed helpers, and then replace the in-file test modules with a single mod dispatch_tests; declaration guarded by #[cfg(test)] so the tests compile and run unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/agent/agents/orchestrator/prompt.rs`:
- Around line 84-85: render_delegation_guide currently interpolates ci.toolkit
directly into "delegate_{ci.toolkit}", which can mismatch runtime-synthesized
tool names that are sanitized; change it to compute and use the same sanitizer
used at tool creation (e.g. let sanitized = sanitize_toolkit_slug(&ci.toolkit)
or call canonicalize_toolkit_name(&ci.toolkit)) and interpolate
"delegate_{sanitized}" instead of raw ci.toolkit, and import/use the existing
sanitizer function used elsewhere so prompt names always match runtime tool
names.
In `@src/openhuman/agent/harness/subagent_runner/ops.rs`:
- Around line 108-117: The truncation uses byte-based APIs causing potential
UTF-8 boundary panics and misreports character count: compute the original
character count with outcome.output.chars().count(), log that value instead of
outcome.output.len(), and when truncating use a character-safe byte index (e.g.,
find the byte offset with outcome.output.char_indices().nth(cap).map(|(i,_)|
i).unwrap_or(outcome.output.len())) then call
outcome.output.truncate(byte_offset) and append the "[...truncated]" suffix;
update the tracing::debug call to use the char count and the max_result_chars
(definition.max_result_chars) for accurate logging.
In `@src/openhuman/agent/prompts/mod.rs`:
- Around line 398-403: The current early return when ctx.tool_call_format ==
ToolCallFormat::Native discards any non-empty dispatcher_instructions; change
this logic to only suppress dispatcher_instructions for Native mode when
ctx.dispatcher_instructions (produced by
NativeToolDispatcher.prompt_instructions()) is empty, otherwise include/pass
through the dispatcher_instructions into the prompt; keep the token-saving
behavior for truly empty dispatcher_instructions but ensure
ctx.tool_call_format, ToolCallFormat::Native, and dispatcher_instructions are
checked and used accordingly.
---
Outside diff comments:
In `@src/openhuman/agent/agents/orchestrator/prompt.md`:
- Around line 55-63: Update the documentation text to stop referencing the
removed spawn_subagent API and instead instruct using the current
spawn_worker_thread tool and its equivalent option (e.g., dedicated_thread flag
or parameter name used by spawn_worker_thread); specifically replace mentions of
spawn_subagent and describe the behavior of spawn_worker_thread when invoked
with the dedicated_thread:true behavior (persisting the sub-agent run into a
fresh worker-labeled thread and returning a compact worker thread id + summary),
and ensure examples and guidance mention using spawn_worker_thread for
long-running multi-step tasks while keeping the default inline delegation for
everyday delegations.
In `@src/openhuman/agent/agents/orchestrator/prompt.rs`:
- Around line 38-42: Add stable-prefix debug/trace logging around the
delegation-guide rendering branches: when calling
render_delegation_guide(ctx.connected_integrations) and before/after the
trim/append branch, emit a trace/debug log that includes a stable prefix (e.g.
"delegation-guide:") plus ctx.connected_integrations count and whether the
section will be emitted or omitted; do the same for the other similar block (the
branch covering the section at lines referenced in the review) so there is a
clear log on entry, decision (emitted/omitted) and resulting string
length/state. Reference the render_delegation_guide function and
ctx.connected_integrations in the log messages and use the project logging crate
(log/tracing) at debug or trace level per guidelines.
In `@src/openhuman/channels/runtime/dispatch.rs`:
- Around line 441-449: fetch_connected_integrations at dispatch.rs (used when
computing extra_tools) currently awaits without a timeout and can block routing;
wrap the call to fetch_connected_integrations(&config).await in a
tokio::time::timeout (3s, same as build_connection_state_block) and if it times
out or errors, log a debug/warn including channel and target_agent (same fields
used in the tracing::debug) and fall back to an empty Vec (or the previous safe
default) before calling
orchestrator_tools::collect_orchestrator_tools(definition, registry,
&connected); ensure you handle both Ok(result) and Err(timeout) paths so
extra_tools is always computed promptly.
In `@src/openhuman/tools/orchestrator_tools.rs`:
- Around line 126-159: The loop that creates delegate_* tools (building
tool_name via sanitise_slug and pushing SkillDelegationTool instances) currently
includes integrations regardless of their connection status; modify the logic so
only actually connected integrations produce delegation tools—either filter
connected_integrations beforehand or early-return/continue in the loop when
integration.connected is false, ensuring SkillDelegationTool entries are created
only for integrations where integration.connected == true so preflight and
prompt guidance remain accurate.
---
Nitpick comments:
In `@src/openhuman/agent/harness/subagent_runner/ops_tests.rs`:
- Around line 641-668: Replace the local truncation checks in the tests with
end-to-end calls to run_subagent and set the subagent definition's
max_result_chars field (e.g., def.max_result_chars = Some(10) for the truncation
test and def.max_result_chars = None for the non‑truncation test), then assert
on the actual output returned by run_subagent (or the runner result structure)
instead of duplicating the truncate logic; update the tests
max_result_chars_cap_is_enforced and max_result_chars_not_applied_when_none to
build the minimal SubagentDef/inputs needed, call run_subagent, and validate
that the returned output starts/ends or equals the expected values.
In `@src/openhuman/channels/runtime/dispatch.rs`:
- Around line 506-519: The file-level tests (the #[cfg(test)] mod scoping_tests
and the large tests module) should be extracted into a sibling test module file
named dispatch_tests.rs to keep the main module under ~500 lines: create
dispatch_tests.rs containing the scoping_tests and tests modules, move any
helper imports from the parent (adjust use paths from super:: or crate:: as
needed), ensure the moved tests reference symbols like scoping_tests,
runtime_dispatch::dispatch_routes_through_agent_run_turn_bus_handler, and any
private items by either making those items pub(crate) or re-exporting needed
helpers, and then replace the in-file test modules with a single mod
dispatch_tests; declaration guarded by #[cfg(test)] so the tests compile and run
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: aca83efd-844a-4c0c-8e07-8c280c801a6f
📒 Files selected for processing (25)
docs/DELEGATION_POLICY.mdsrc/openhuman/agent/agents/code_executor/agent.tomlsrc/openhuman/agent/agents/orchestrator/agent.tomlsrc/openhuman/agent/agents/orchestrator/prompt.mdsrc/openhuman/agent/agents/orchestrator/prompt.rssrc/openhuman/agent/agents/planner/agent.tomlsrc/openhuman/agent/agents/researcher/agent.tomlsrc/openhuman/agent/harness/definition.rssrc/openhuman/agent/harness/definition_tests.rssrc/openhuman/agent/harness/fork_context.rssrc/openhuman/agent/harness/payload_summarizer.rssrc/openhuman/agent/harness/session/turn.rssrc/openhuman/agent/harness/subagent_runner/ops.rssrc/openhuman/agent/harness/subagent_runner/ops_tests.rssrc/openhuman/agent/prompts/mod.rssrc/openhuman/agent/prompts/mod_tests.rssrc/openhuman/agent/triage/escalation.rssrc/openhuman/channels/runtime/dispatch.rssrc/openhuman/config/schema/context.rssrc/openhuman/tools/impl/agent/spawn_worker_thread.rssrc/openhuman/tools/impl/memory/tree/mod.rssrc/openhuman/tools/ops.rssrc/openhuman/tools/orchestrator_tools.rstests/agent_harness_public.rstests/calendar_grounding_e2e.rs
senamakel
left a comment
There was a problem hiding this comment.
resolve coderabbit issues and checks
…urface changes spawn_subagent was removed in tinyhumansai#1141; spawn_worker_thread and memory_tree are the new entries the test should assert on. Pins the new contract so future TOML edits can't silently revert it.
String::len() counts bytes; String::truncate(n) panics if n lands in the middle of a multi-byte UTF-8 sequence. Switch to chars().count() for the guard and char_indices().nth(cap) to find the safe byte offset before truncating. Also fix the tracing::debug log field — original_chars now correctly counts characters, not bytes. Fixes @coderabbitai comment on ops.rs:117.
… tools render_delegation_guide was printing raw ci.toolkit as the delegate_* tool name, but collect_orchestrator_tools sanitises toolkit slugs via sanitise_slug() first. A quirky slug (e.g. "Google Calendar" with a space) would print a tool name in the prompt that never exists in the function-calling schema. Fix: export sanitise_slug as pub(crate) and use it in render_delegation_guide so prompt and schema always agree. Also: - Skip unconnected integrations in collect_orchestrator_tools (previously delegate_* tools were synthesised for disconnected toolkits, whose pre-flight check would immediately reject any call). - Add tracing::debug log lines around the delegation-guide rendering branches (connected count, emit/omit decision, section size). - Update prompt.md to remove stale spawn_subagent reference in "Dedicated worker threads" section — the tool was removed in tinyhumansai#1141. - Update prompt.rs module docstring accordingly. Addresses @coderabbitai comments on prompt.rs:85, orchestrator_tools.rs:126-159, prompt.md:55-63, and prompt.rs:38-42.
…rmat The ToolsSection early-return on ToolCallFormat::Native discarded any non-empty dispatcher_instructions. NativeToolDispatcher::prompt_instructions() returns a non-empty "## Tool Use Protocol" block that models need in order to understand they should emit native tool calls rather than narrate intent. New behaviour: if dispatcher_instructions is non-empty for Native mode, return it directly (the tool catalogue is still omitted to save tokens); if empty, return String::new() as before. Adds a test to pin the new contract. Addresses @coderabbitai comment on prompts/mod.rs:398-403.
…ing path build_connection_state_block already wraps its fetch in a 3-second timeout to prevent Composio API latency from blocking turn processing, but the fetch at the delegation-tools expansion site (line ~442) had no such protection. Slow or unresponsive Composio API could block agent routing indefinitely. Wrap with the same 3s tokio::time::timeout and fall back to Vec::new() on timeout, matching the pattern used elsewhere in this file. Addresses @coderabbitai comment on dispatch.rs:441-449.
…ed-integration tests - ops_tests.rs: add max_result_chars_cap_is_char_safe_for_multibyte to prove truncation at a multi-byte codepoint boundary does not panic; update existing tests to use the char-safe logic matching ops.rs. - orchestrator_tools.rs: add unconnected_integrations_are_skipped to pin the new connected-only filtering contract. Addresses @coderabbitai nitpick on ops_tests.rs:641-668.
|
Yes..Working on those itself. |
M3gA-Mind
left a comment
There was a problem hiding this comment.
pr-manager: all actionable CodeRabbit comments addressed in 5 commits (b47a984..97e5c91).
Deferred (low value — Standards): @coderabbitai suggested extracting the #[cfg(test)] modules from src/openhuman/channels/runtime/dispatch.rs into a sibling dispatch_tests.rs to keep the file under the ~500-line guideline. The file is currently ~1,425 lines. This is a valid long-term concern but is a pure refactor with no behavioural change and no CI impact. The right moment to do this split is when the file is next substantially modified. Deciding factor: do it then, not now.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/openhuman/agent/harness/subagent_runner/ops_tests.rs (1)
640-703: ⚡ Quick winSplit the new truncation tests into a dedicated test module/file.
This test file now exceeds ~500 lines and keeps growing. Moving
max_result_chars_*cases into a sibling test module will keep this suite maintainable.As per coding guidelines: "Keep source files to ≤ ~500 lines per file; split modules when growing larger."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/harness/subagent_runner/ops_tests.rs` around lines 640 - 703, Move the three truncation tests (max_result_chars_cap_is_enforced, max_result_chars_cap_is_char_safe_for_multibyte, max_result_chars_not_applied_when_none) out of the large ops_tests.rs into a new dedicated test file (e.g., create tests/ops_truncation.rs) and delete them from the original file; copy the test functions verbatim into the new file as standalone #[test] functions so they run as integration tests (no extra module imports needed), ensuring the behavior and names remain unchanged and the original ops_tests.rs file stays under ~500 lines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/agent/agents/orchestrator/prompt.md`:
- Line 33: The prompt contains conflicting delegation terminology: one spot
recommends using delegate_{toolkit} for external integrations while the decision
tree still references integrations_agent; update the decision tree text and any
examples in prompt.md to consistently use the delegate_{toolkit} pattern
(replace mentions of integrations_agent with delegate_{toolkit}), and ensure
related lines also clarify using spawn_worker_thread for long-running tasks and
delegate_archivist for memory writes so all delegation names
(delegate_researcher, delegate_run_code, delegate_plan, delegate_critic,
delegate_archivist, delegate_{toolkit}, spawn_worker_thread) are used
consistently in the decision flow.
In `@src/openhuman/agent/harness/subagent_runner/ops_tests.rs`:
- Around line 641-702: Replace the duplicated truncation logic in these tests by
exercising the real runner path: call run_subagent with a Definition where
def.max_result_chars is set (for the cap tests) or None (for the no-cap test),
mock the provider response to return the long/multibyte strings used in the
current tests, then assert on the returned outcome.output (not a
locally-truncated string) that it is truncated to the expected character count,
ends with "[...truncated]" and handles multibyte chars safely; use the existing
test names but invoke run_subagent and check outcome.output to ensure ops.rs’s
truncation logic is actually exercised.
---
Nitpick comments:
In `@src/openhuman/agent/harness/subagent_runner/ops_tests.rs`:
- Around line 640-703: Move the three truncation tests
(max_result_chars_cap_is_enforced,
max_result_chars_cap_is_char_safe_for_multibyte,
max_result_chars_not_applied_when_none) out of the large ops_tests.rs into a new
dedicated test file (e.g., create tests/ops_truncation.rs) and delete them from
the original file; copy the test functions verbatim into the new file as
standalone #[test] functions so they run as integration tests (no extra module
imports needed), ensuring the behavior and names remain unchanged and the
original ops_tests.rs file stays under ~500 lines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9ad2b5e6-49e8-49ef-9f2a-55aac9da848b
📒 Files selected for processing (9)
src/openhuman/agent/agents/loader.rssrc/openhuman/agent/agents/orchestrator/prompt.mdsrc/openhuman/agent/agents/orchestrator/prompt.rssrc/openhuman/agent/harness/subagent_runner/ops.rssrc/openhuman/agent/harness/subagent_runner/ops_tests.rssrc/openhuman/agent/prompts/mod.rssrc/openhuman/agent/prompts/mod_tests.rssrc/openhuman/channels/runtime/dispatch.rssrc/openhuman/tools/orchestrator_tools.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- src/openhuman/tools/orchestrator_tools.rs
- src/openhuman/agent/harness/subagent_runner/ops.rs
- src/openhuman/channels/runtime/dispatch.rs
…ed tool tinyhumansai#1141 consolidated 6 individual memory_tree_* tools into a single memory_tree tool with mode dispatch. The integration test in tests/agent_retrieval_e2e.rs still asserted all 6 old names existed in the orchestrator TOML, causing CI to fail. Update the test to: - Assert memory_tree IS present (the new consolidated tool) - Assert none of the old individual names are present (pinning removal)
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/agent_retrieval_e2e.rs (1)
94-111:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStrengthen the
memory_treeassertion to check exact tool entriesLine 95 currently does a broad substring match, so this can pass even if
memory_treeappears only in comments/text and not as a real tool entry. Prefer exact entry checks in the TOML list.Proposed fix
let toml = include_str!("../src/openhuman/agent/agents/orchestrator/agent.toml"); - assert!( - toml.contains("memory_tree"), - "orchestrator agent.toml must list 'memory_tree' so the LLM can call it" - ); + let has_memory_tree_entry = toml + .lines() + .map(str::trim) + .any(|line| line == "\"memory_tree\"" || line == "\"memory_tree\","); + assert!( + has_memory_tree_entry, + "orchestrator agent.toml must list 'memory_tree' so the LLM can call it" + ); @@ for old_name in [ "memory_tree_search_entities", "memory_tree_query_topic", "memory_tree_query_source", "memory_tree_query_global", "memory_tree_drill_down", "memory_tree_fetch_leaves", ] { + let old_entry = format!("\"{old_name}\""); + let old_entry_comma = format!("\"{old_name}\","); + let old_tool_present = toml + .lines() + .map(str::trim) + .any(|line| line == old_entry || line == old_entry_comma); assert!( - !toml.contains(old_name), + !old_tool_present, "orchestrator agent.toml must NOT list '{old_name}' — removed in `#1141` (use 'memory_tree' with mode= dispatch)" ); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/agent_retrieval_e2e.rs` around lines 94 - 111, The current assert uses toml.contains("memory_tree") which can match comments or other text; instead parse or precisely match the tool table entry for the orchestrator to ensure a real tool is declared. Update the test around the toml variable to either (a) parse the TOML string and assert there is a tool entry with name == "memory_tree" (using the toml crate or a small parser), or (b) assert the exact snippet that represents the tool entry exists (for example the table/array entry that contains name = "memory_tree" and any required fields), keeping the existing negative checks for the old_name array intact; reference the toml variable and the loop over old_name in tests/agent_retrieval_e2e.rs when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@tests/agent_retrieval_e2e.rs`:
- Around line 94-111: The current assert uses toml.contains("memory_tree") which
can match comments or other text; instead parse or precisely match the tool
table entry for the orchestrator to ensure a real tool is declared. Update the
test around the toml variable to either (a) parse the TOML string and assert
there is a tool entry with name == "memory_tree" (using the toml crate or a
small parser), or (b) assert the exact snippet that represents the tool entry
exists (for example the table/array entry that contains name = "memory_tree" and
any required fields), keeping the existing negative checks for the old_name
array intact; reference the toml variable and the loop over old_name in
tests/agent_retrieval_e2e.rs when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8eba3497-f2bb-43d8-a5b9-c7faaef1f6f1
📒 Files selected for processing (1)
tests/agent_retrieval_e2e.rs
- orchestrator/prompt.md: replace `integrations_agent` references in
the decision tree with the actual synthesised `delegate_{toolkit}` /
`delegate_run_code` / `delegate_plan` etc. tool names so the prompt
matches what the LLM sees in its function-calling schema
- tests/agent_retrieval_e2e.rs: strengthen memory_tree tool assertion to
use exact line matching (`"memory_tree"` / `"memory_tree",`) rather
than a substring match that could hit comments or prefixed names
- subagent_runner/ops_truncation_tests.rs: extract the three
max_result_chars truncation tests into their own file, keeping
ops_tests.rs under the 500-line guideline threshold
|
@sanil-23 pls review |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/openhuman/agent/harness/subagent_runner/ops_truncation_tests.rs (1)
7-67:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThese tests validate duplicated logic, not the production
run_subagenttruncation path.Right now they can stay green even if
src/openhuman/agent/harness/subagent_runner/ops.rstruncation regresses. Please drive these assertions throughrun_subagent(def.max_result_chars+ provider response +outcome.output) instead of re-implementing truncation in-test.Suggested direction
-#[test] -fn max_result_chars_cap_is_enforced() { - // local truncation implementation... -} +#[tokio::test] +async fn max_result_chars_cap_is_enforced() { + // Arrange parent/provider + def.max_result_chars = Some(10) + // Act: run_subagent(...) + // Assert: outcome.output == "hello worl\n[...truncated]" +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/harness/subagent_runner/ops_truncation_tests.rs` around lines 7 - 67, Tests currently re-implement truncation logic locally instead of exercising the production path; update them to call run_subagent and assert on the returned Outcome (outcome.output) when def.max_result_chars is set and the mocked provider response is long, so truncation happens in src/openhuman/agent/harness/subagent_runner/ops.rs; specifically, replace the inline truncation in the three tests with setups that (1) configure a SubagentDef with max_result_chars = Some(cap), (2) mock the provider response to return the long string, (3) invoke run_subagent, and (4) assert the outcome.output length/content and that it ends with "[...truncated]" (also include the case cap = None to ensure no truncation).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/openhuman/agent/harness/subagent_runner/ops_truncation_tests.rs`:
- Around line 7-67: Tests currently re-implement truncation logic locally
instead of exercising the production path; update them to call run_subagent and
assert on the returned Outcome (outcome.output) when def.max_result_chars is set
and the mocked provider response is long, so truncation happens in
src/openhuman/agent/harness/subagent_runner/ops.rs; specifically, replace the
inline truncation in the three tests with setups that (1) configure a
SubagentDef with max_result_chars = Some(cap), (2) mock the provider response to
return the long string, (3) invoke run_subagent, and (4) assert the
outcome.output length/content and that it ends with "[...truncated]" (also
include the case cap = None to ensure no truncation).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8363f936-7bb7-43af-a5b8-8b0ec016f509
📒 Files selected for processing (5)
src/openhuman/agent/agents/orchestrator/prompt.mdsrc/openhuman/agent/harness/subagent_runner/ops.rssrc/openhuman/agent/harness/subagent_runner/ops_tests.rssrc/openhuman/agent/harness/subagent_runner/ops_truncation_tests.rstests/agent_retrieval_e2e.rs
✅ Files skipped from review due to trivial changes (1)
- tests/agent_retrieval_e2e.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/openhuman/agent/agents/orchestrator/prompt.md
| - If complex multi-step decomposition is required, use `delegate_plan`. | ||
| - If code review is requested, use `delegate_critic`. | ||
| - If memory archiving or distillation is required, use `delegate_archivist`. | ||
| 4. **After delegation**, summarise results clearly and concisely. |
There was a problem hiding this comment.
Drift risk: archetype names in this prose can diverge from the synthesised delegate_* tools.
Step 3 here lists delegate_run_code, delegate_researcher, delegate_plan, delegate_critic, delegate_archivist as hardcoded markdown. The summary line below repeats them. Meanwhile the source of truth for which delegate_* tools actually exist is agents/orchestrator/agent.toml's subagents = [...] field, expanded into tool schemas by collect_orchestrator_tools.
If anyone later removes (say) archivist from subagents, the synthesised delegate_archivist tool disappears from the API tool schema, but this prompt still tells the LLM to "use delegate_archivist for memory writes." The model attempts to call a tool that doesn't exist.
This PR did the right thing for integrations — render_delegation_guide derives the ## Connected Integrations block from runtime data, so prompt and schema cannot drift. But archetypes still go through hardcoded prose.
Two options:
- Make
render_delegation_guidealso iteratedefinition.subagentsarchetype entries and emit one line per archetype using each target'swhen_to_use. Then archetypes and integrations share one drift-free source — same guarantee this PR already gives integrations. - Cheap fallback: add a startup test that asserts every
delegate_<archetype>literal mentioned inprompt.mdappears in the synthesised orchestrator tool set. Catches drift in CI without restructuring the prompt.
Either is acceptable; option 1 is the principled fix.
| Only the integrations listed below are currently authorised. \ | ||
| If the user asks about another service, tell them to connect \ | ||
| it in **Skills** page before retrying.\n\n", | ||
| "## Connected Integrations\n\n\ |
There was a problem hiding this comment.
Question for you: now that each connected toolkit gets its own delegate_{toolkit} tool with auto-generated description, is the per-bullet ## Connected Integrations block still pulling its weight?
Each connected toolkit is described in two places after this PR:
- The
delegate_{toolkit}tool's schema description inorchestrator_tools.rs:142-150(sent to the API on every request) - The bullet line this function renders into the system prompt
Both carry the toolkit name, the tool to call, and the description. The only unique signal this prompt block adds is the naming-pattern hint at the top (Delegate tasks for these services using the matching delegate_{toolkit} tool).
The rest of this PR's token wins (skipping render_tools() for native, dropping the "Available Sub-Agents" + "Direct Tools" tables) are explicitly motivated by don't repeat the schema in prose. The same logic applies here, but this PR keeps the prose listing.
Two coherent shapes — picking either is fine; picking both (this PR's current state) is the worst of both worlds:
Option A — Drop the per-toolkit list, keep just the naming-pattern hint.
Replace this entire function's output with one line in the rules section:
External integrations are exposed as
delegate_{toolkit}tools — only currently-connected toolkits are present in your tool list.
Saves ~30–50 prompt tokens per connected integration on every cache miss. Loses nothing the schema doesn't already encode. Consistent with Wins 2 and 5 of this PR.
Option B — Drop the per-toolkit delegate_* tools, go back to one generic spawn_subagent-style tool with a toolkit parameter, and keep the prompt list.
Cheaper in tool-schema bytes (one schema instead of N), but loses the tool-selection-accuracy win that justifies Step 4 of this PR. Effectively reverses part of the refactor.
Which way did you intend? If A, this function shrinks to a single static line. If B, the synthesis logic in orchestrator_tools.rs simplifies.
# Conflicts: # src/openhuman/agent/agents/orchestrator/agent.toml # src/openhuman/tools/ops.rs
Summary
Implements the changes from the orchestrator harness efficiency audit (#1141) to reduce token waste, tighten delegation behavior, and improve sub-agent context efficiency.
omit_skills_catalog = falseon the summarizer archetype — fixed by the existing flag)render_tools()for native function-calling: provider already sends JSON schemas in the API payload — the system-prompt tool catalogue was pure duplicate tokens (~1,500–3,000 tokens/turn savings on native sessions)memory_tree_*tools → 1memory_treewithmodeenum: shrinks orchestrator's tool surface by 5 entries per API call without losing any capabilityschedule+spawn_subagentfrom orchestrator:cron_add/cron_list/cron_removeare the canonical scheduling surface;delegate_*+spawn_worker_threadreplacespawn_subagentrender_delegation_guide: single line per toolkit instead of verbose multi-line spawn_subagent snippetmax_result_charscap: 8k for planner/researcher, 16k for code_executor — prevents context flooding from large sub-agent outputsmemory_contextinArc<>:ParentExecutionContext::memory_contextis nowArc<Option<String>>— spawning a sub-agent shares the allocation instead of cloning potentially large stringsdocs/DELEGATION_POLICY.md: codifies the 4-tier direct-first decision treeTest plan
cargo check -p openhuman— clean (0 errors)cargo clippy -p openhuman— clean (0 errors)cargo fmt --all --check— cleancargo test -p openhuman --lib— 5,982 pass, 9 pre-existing flaky tests (all pass in isolation: network/port-conflict dependent, unrelated to this change)NODE_ENV=test pnpm --filter openhuman-app compile— cleanNODE_ENV=test pnpm --filter openhuman-app lint— 0 errors (32 pre-existing warnings)NODE_ENV=test pnpm --filter openhuman-app format:check— cleanToolsSectionnative-skip,MemoryTreeTooldispatcher (5 tests),max_result_charstruncation (2 tests), delegation guide compact format (1 test)prompt.rstests passtools_sectiontests pass (including 2 new Native/PFormat variants)Closes #1141
Summary by CodeRabbit
Documentation
New Features
Improvements
Tests