Skip to content

perf(agent): orchestrator harness efficiency improvements#1314

Merged
senamakel merged 10 commits intotinyhumansai:mainfrom
M3gA-Mind:feat/1141-orchestrator-efficiency
May 9, 2026
Merged

perf(agent): orchestrator harness efficiency improvements#1314
senamakel merged 10 commits intotinyhumansai:mainfrom
M3gA-Mind:feat/1141-orchestrator-efficiency

Conversation

@M3gA-Mind
Copy link
Copy Markdown
Contributor

@M3gA-Mind M3gA-Mind commented May 6, 2026

Summary

Implements the changes from the orchestrator harness efficiency audit (#1141) to reduce token waste, tighten delegation behavior, and improve sub-agent context efficiency.

  • Re-enable payload summarizer at 4,000-token threshold (was disabled after recursive dispatch; root cause was omit_skills_catalog = false on the summarizer archetype — fixed by the existing flag)
  • Skip render_tools() for native function-calling: provider already sends JSON schemas in the API payload — the system-prompt tool catalogue was pure duplicate tokens (~1,500–3,000 tokens/turn savings on native sessions)
  • Consolidate 6 memory_tree_* tools → 1 memory_tree with mode enum: shrinks orchestrator's tool surface by 5 entries per API call without losing any capability
  • Remove schedule + spawn_subagent from orchestrator: cron_add/cron_list/cron_remove are the canonical scheduling surface; delegate_* + spawn_worker_thread replace spawn_subagent
  • Remove redundant tables from orchestrator prompt: "Available Sub-Agents" and "Direct Tools" tables removed — LLM sees these as native API tool schemas already
  • Compact render_delegation_guide: single line per toolkit instead of verbose multi-line spawn_subagent snippet
  • Add max_result_chars cap: 8k for planner/researcher, 16k for code_executor — prevents context flooding from large sub-agent outputs
  • Wrap memory_context in Arc<>: ParentExecutionContext::memory_context is now Arc<Option<String>> — spawning a sub-agent shares the allocation instead of cloning potentially large strings
  • docs/DELEGATION_POLICY.md: codifies the 4-tier direct-first decision tree

Test plan

  • cargo check -p openhuman — clean (0 errors)
  • cargo clippy -p openhuman — clean (0 errors)
  • cargo fmt --all --check — clean
  • cargo test -p openhuman --lib — 5,982 pass, 9 pre-existing flaky tests (all pass in isolation: network/port-conflict dependent, unrelated to this change)
  • NODE_ENV=test pnpm --filter openhuman-app compile — clean
  • NODE_ENV=test pnpm --filter openhuman-app lint — 0 errors (32 pre-existing warnings)
  • NODE_ENV=test pnpm --filter openhuman-app format:check — clean
  • Unit tests added: ToolsSection native-skip, MemoryTreeTool dispatcher (5 tests), max_result_chars truncation (2 tests), delegation guide compact format (1 test)
  • All 7 orchestrator prompt.rs tests pass
  • All 4 tools_section tests pass (including 2 new Native/PFormat variants)

Closes #1141

Summary by CodeRabbit

  • Documentation

    • Added a delegation policy guide and updated orchestrator delegation and scheduling guidance.
  • New Features

    • Spawn dedicated worker threads for long-running tasks (returns compact worker references).
    • Cron scheduling: add/list/remove jobs and current-time queries.
    • Unified memory-tree tool with mode-based queries.
    • Per-agent output character limits with truncation indicator.
  • Improvements

    • Re-enabled payload summarization threshold and refined tool scoping to hide unused delegation tools.
  • Tests

    • Added/updated tests covering truncation, memory-tree behavior, and tool rendering.

…ai#1141)

- Re-enable payload summarizer at 4,000-token threshold (was disabled at
  0 after recursive-dispatch was observed; root cause was the missing
  `omit_skills_catalog = true` guard on the summarizer archetype)
- Skip `render_tools()` catalogue for native function-calling sessions
  (provider already sends JSON schemas in the API payload — duplicate in
  the system prompt was pure token bloat)
- Consolidate 6 separate `memory_tree_*` tools into one `memory_tree`
  tool dispatched by `mode` enum, shrinking the orchestrator's tool
  surface by 5 entries per API call
- Remove `schedule` and `spawn_subagent` from orchestrator tool list;
  `cron_add/cron_list/cron_remove` are the canonical scheduling surface
  and `delegate_*` + `spawn_worker_thread` replace `spawn_subagent`
- Remove redundant "Available Sub-Agents" and "Direct Tools" tables from
  orchestrator prompt (LLM already sees these as native API tool schemas)
- Compact `render_delegation_guide` to a single line per toolkit instead
  of a verbose multi-line spawn_subagent snippet
- Add `max_result_chars` to `AgentDefinition`; set 8k for planner and
  researcher, 16k for code_executor to prevent context flooding
- Wrap `ParentExecutionContext::memory_context` in `Arc<Option<String>>`
  so sub-agent spawns share the same allocation instead of cloning
- Add `docs/DELEGATION_POLICY.md` codifying the 4-tier direct-first
  decision tree
- Unit tests for all new behavior (ToolsSection native skip,
  memory_tree dispatcher modes, max_result_chars truncation,
  delegation guide compact format)

Closes tinyhumansai#1141
@M3gA-Mind M3gA-Mind requested a review from a team May 6, 2026 22:12
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 605e0de9-f080-496c-95b9-11a4e96554fb

📥 Commits

Reviewing files that changed from the base of the PR and between f9943f7 and c2ffba9.

📒 Files selected for processing (5)
  • src/openhuman/agent/agents/orchestrator/agent.toml
  • src/openhuman/agent/harness/subagent_runner/ops.rs
  • src/openhuman/agent/harness/subagent_runner/ops_tests.rs
  • src/openhuman/agent/prompts/mod.rs
  • src/openhuman/agent/triage/escalation.rs
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/openhuman/agent/harness/subagent_runner/ops.rs
  • src/openhuman/agent/prompts/mod.rs
  • src/openhuman/agent/harness/subagent_runner/ops_tests.rs
  • src/openhuman/agent/agents/orchestrator/agent.toml

📝 Walkthrough

Walkthrough

This PR adds per-agent output caps and UTF‑8-safe truncation, consolidates six memory-tree tools into a single mode-dispatching MemoryTreeTool, converts ParentExecutionContext.memory_context to Arc-wrapped ownership, narrows orchestrator delegation/tool surfaces and prompt text, introduces per-turn agent scoping in channel dispatch, re-enables payload-summarizer threshold, and updates tests/docs.

Changes

Delegation Policy & Orchestrator Tooling

Layer / File(s) Summary
Policy Documentation
docs/DELEGATION_POLICY.md
New four-tier delegation decision tree documenting when to use direct replies, direct tools, inline sub-agents, or worker threads, plus anti-patterns.
Orchestrator Prompt & Configuration
src/openhuman/agent/agents/orchestrator/prompt.md, src/openhuman/agent/agents/orchestrator/agent.toml
Prompt standardizes delegate_{toolkit} helpers, calls out spawn_worker_thread for long tasks, consolidates memory-tree guidance to memory_tree(mode: ...), and agent.toml removes spawn_subagent while adding memory_tree, current_time, and cron_*.
Orchestrator Tool Registry & Delegation
src/openhuman/agent/agents/orchestrator/prompt.rs, src/openhuman/tools/orchestrator_tools.rs
Prompt builder emits a ## Connected Integrations block listing only connected toolkits as delegate_{toolkit} entries; tool collector skips unconnected integrations when synthesizing delegate_* tools.
Channel Dispatch Routing & Tool Scoping
src/openhuman/channels/runtime/dispatch.rs
Adds AgentScoping, resolve_target_agent, and build_visible_tool_set, wiring target_agent_id, visible_tool_names, and extra_tools into AgentTurnRequest; welcome turns get connection-state annotation.
Tests & Prompt Integration
src/openhuman/agent/agents/orchestrator/prompt.rs (tests), src/openhuman/agent/prompts/mod_tests.rs
Tests updated to check ## Connected Integrations presence/absence, delegate_* tool names, removal of legacy spawn_subagent wording, and ToolsSection suppression for native tool-call formats.

Sub-agent Output Control

Layer / File(s) Summary
Agent Definition Schema
src/openhuman/agent/harness/definition.rs
Adds optional max_result_chars: Option<usize> to cap per-sub-agent output length.
Parent Execution Context & Memory Ownership
src/openhuman/agent/harness/fork_context.rs, src/openhuman/agent/harness/session/turn.rs, tests
ParentExecutionContext.memory_context changed from Option<String> to Arc<Option<String>>; call sites and tests updated to construct Arc-wrapped values.
Sub-agent Output Truncation
src/openhuman/agent/harness/subagent_runner/ops.rs
run_subagent truncates outcome.output to definition.max_result_chars when set with UTF-8-safe handling and appends "\n[...truncated]".
Payload Summarization Re-enablement
src/openhuman/config/schema/context.rs
summarizer_payload_threshold_tokens default changed from 0 to 4000, re-enabling tool-output summarization for oversized payloads.
Agent Configurations
src/openhuman/agent/agents/code_executor/agent.toml, src/openhuman/agent/agents/planner/agent.toml, src/openhuman/agent/agents/researcher/agent.toml
Per-agent max_result_chars added: code_executor=16000, planner=8000, researcher=8000.
Harness & Test Fixtures
various harness and test files
Test helpers updated to initialize max_result_chars and use Arc-wrapped memory_context; new truncation and summarizer tests validate behavior.

Memory Tree Tool Consolidation

Layer / File(s) Summary
Memory Tree Tool Implementation
src/openhuman/tools/impl/memory/tree/mod.rs
Adds MemoryTreeTool that requires a mode enum (search_entities, query_topic, query_source, query_global, drill_down, fetch_leaves) and dispatches to per-mode handlers; re-exports per-mode structs.
Tool Schema & Dispatch
same file
Tool JSON schema requires mode, validates input, logs selected mode, and delegates to corresponding per-mode implementation; errors on missing/unknown mode.
Tool Registry
src/openhuman/tools/ops.rs
Tool registry now registers single MemoryTreeTool instead of six separate MemoryTree* tools.
Tests
src/openhuman/tools/impl/memory/tree/mod.rs (tests)
Unit tests validate tool name, required mode, presence of all modes, and runtime error handling for missing/invalid mode.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/Channel
    participant Dispatch as Channel Dispatch
    participant DefReg as AgentDefinition Registry
    participant Orch as Orchestrator Agent
    participant SubAgent as Sub-agent Runner
    participant Summarizer as Payload Summarizer

    User->>Dispatch: AgentTurnRequest
    Dispatch->>DefReg: resolve_target_agent()
    DefReg-->>Dispatch: agent definition + connected integrations
    Dispatch->>Orch: AgentTurnRequest + target_agent_id + visible_tool_names + extra_tools
    Orch->>Orch: select delegation target (direct tool, sub-agent, or worker)
    alt Direct Tool Call
        Orch->>Orch: execute tool directly
    else Delegate to Sub-agent
        Orch->>SubAgent: run_subagent(agent_id, prompt, parent_context)
        SubAgent->>DefReg: fetch sub-agent definition (incl. max_result_chars)
        DefReg-->>SubAgent: definition + output cap
        SubAgent->>SubAgent: execute sub-agent logic
        SubAgent->>SubAgent: truncate output if exceeds max_result_chars
        SubAgent->>Summarizer: maybe_summarize(output) if > threshold tokens
        Summarizer-->>SubAgent: summarized or original
        SubAgent-->>Orch: SubagentRunOutcome (possibly truncated/summarized)
    end
    Orch-->>User: final response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • senamakel

"🐰
I stitched the tiers where tasks decide,
Trimmed long replies so outputs hide,
Trees now ask which mode to run,
Threads spawn workers when work weighs a ton,
Hop, summarize, and pass the guide."

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'perf(agent): orchestrator harness efficiency improvements' accurately summarizes the main objective: performance improvements to the orchestrator harness through efficiency enhancements.
Linked Issues check ✅ Passed The PR successfully implements all major requirements from #1141: re-enabled payload summarizer, native function-call tool-skip, consolidated memory_tree tools, removed schedule/spawn_subagent with cron/delegate alternatives, added result-char caps, Arc-wrapped memory contexts, and added DELEGATION_POLICY documentation.
Out of Scope Changes check ✅ Passed All changes are directly aligned with #1141 objectives: efficiency improvements, delegation policy enforcement, tool surface reduction, context optimization, and performance enhancements. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/openhuman/agent/agents/orchestrator/prompt.md (1)

55-63: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Worker-thread instructions still reference removed spawn_subagent flow.

This block tells the orchestrator to use spawn_subagent(... dedicated_thread: true), but the orchestrator tool surface now uses spawn_worker_thread. Keeping this text will produce invalid tool plans.

Suggested fix
-`spawn_subagent` accepts an optional `dedicated_thread: true` flag. When set, the
-sub-agent's run is persisted into a fresh **worker**-labeled thread the user can
-open from the thread list, and you receive a compact reference (worker thread id
-+ brief summary) instead of the full sub-agent transcript. Use this **only**
-when the sub-task is genuinely long or complex and the parent thread should not
-be flooded with the sub-agent's output — for example multi-step research,
-multi-file refactors, or batch integration work that produces a large
-transcript. For everyday delegation keep `dedicated_thread` off (the default)
-and surface the result inline.
+Use `spawn_worker_thread` for genuinely long/complex delegated tasks where full
+sub-agent transcripts would flood the parent thread. It creates a persisted
+**worker** thread and returns a compact `[worker_thread_ref]` to the parent.
+For routine delegation, use regular `delegate_*` tools and keep results inline.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/agents/orchestrator/prompt.md` around lines 55 - 63,
Update the documentation text to stop referencing the removed spawn_subagent API
and instead instruct using the current spawn_worker_thread tool and its
equivalent option (e.g., dedicated_thread flag or parameter name used by
spawn_worker_thread); specifically replace mentions of spawn_subagent and
describe the behavior of spawn_worker_thread when invoked with the
dedicated_thread:true behavior (persisting the sub-agent run into a fresh
worker-labeled thread and returning a compact worker thread id + summary), and
ensure examples and guidance mention using spawn_worker_thread for long-running
multi-step tasks while keeping the default inline delegation for everyday
delegations.
src/openhuman/tools/orchestrator_tools.rs (1)

126-159: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

delegate_* tools are generated even for unconnected integrations.

The wildcard expansion ignores integration.connected. If the input slice includes unconnected entries, the tool schema exposes delegates that will fail preflight and drift from prompt guidance.

Suggested fix
                 for integration in connected_integrations {
+                    if !integration.connected {
+                        continue;
+                    }
                     // Slug the toolkit name into a tool-name-safe form.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/orchestrator_tools.rs` around lines 126 - 159, The loop
that creates delegate_* tools (building tool_name via sanitise_slug and pushing
SkillDelegationTool instances) currently includes integrations regardless of
their connection status; modify the logic so only actually connected
integrations produce delegation tools—either filter connected_integrations
beforehand or early-return/continue in the loop when integration.connected is
false, ensuring SkillDelegationTool entries are created only for integrations
where integration.connected == true so preflight and prompt guidance remain
accurate.
src/openhuman/agent/agents/orchestrator/prompt.rs (1)

38-42: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add debug/trace logs around delegation-guide rendering branches.

The changed prompt assembly path now conditionally injects/removes integration guidance but has no diagnostics. Please add stable-prefix debug logs for branch decisions (connected count, section emitted/omitted) to align with the Rust observability guideline.

As per coding guidelines: "Use log / tracing at debug or trace level on RPC entry/exit, error paths, state transitions, and any branch that is hard to infer from tests; include stable prefixes."

Also applies to: 71-89

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/agents/orchestrator/prompt.rs` around lines 38 - 42, Add
stable-prefix debug/trace logging around the delegation-guide rendering
branches: when calling render_delegation_guide(ctx.connected_integrations) and
before/after the trim/append branch, emit a trace/debug log that includes a
stable prefix (e.g. "delegation-guide:") plus ctx.connected_integrations count
and whether the section will be emitted or omitted; do the same for the other
similar block (the branch covering the section at lines referenced in the
review) so there is a clear log on entry, decision (emitted/omitted) and
resulting string length/state. Reference the render_delegation_guide function
and ctx.connected_integrations in the log messages and use the project logging
crate (log/tracing) at debug or trace level per guidelines.
src/openhuman/channels/runtime/dispatch.rs (1)

441-449: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing timeout on fetch_connected_integrations may block turn dispatch.

build_connection_state_block (lines 232-245) wraps fetch_connected_integrations in a 3-second timeout to prevent Composio API latency from blocking the turn. However, this call at line 442 has no such protection. If the Composio API is slow or unresponsive, this could delay agent routing indefinitely.

Proposed fix: wrap in timeout with fallback
     let extra_tools = if !definition.subagents.is_empty() {
-        let connected = fetch_connected_integrations(&config).await;
+        const COMPOSIO_FETCH_TIMEOUT_SECS: u64 = 3;
+        let connected = match tokio::time::timeout(
+            Duration::from_secs(COMPOSIO_FETCH_TIMEOUT_SECS),
+            fetch_connected_integrations(&config),
+        )
+        .await
+        {
+            Ok(list) => list,
+            Err(_) => {
+                tracing::warn!(
+                    channel = %channel,
+                    target_agent = target_id,
+                    "[dispatch::routing] Composio fetch timed out — proceeding without connected integrations"
+                );
+                Vec::new()
+            }
+        };
         tracing::debug!(
             channel = %channel,
             target_agent = target_id,
             connected_integration_count = connected.len(),
             "[dispatch::routing] fetched connected integrations for delegation expansion"
         );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/channels/runtime/dispatch.rs` around lines 441 - 449,
fetch_connected_integrations at dispatch.rs (used when computing extra_tools)
currently awaits without a timeout and can block routing; wrap the call to
fetch_connected_integrations(&config).await in a tokio::time::timeout (3s, same
as build_connection_state_block) and if it times out or errors, log a debug/warn
including channel and target_agent (same fields used in the tracing::debug) and
fall back to an empty Vec (or the previous safe default) before calling
orchestrator_tools::collect_orchestrator_tools(definition, registry,
&connected); ensure you handle both Ok(result) and Err(timeout) paths so
extra_tools is always computed promptly.
🧹 Nitpick comments (2)
src/openhuman/agent/harness/subagent_runner/ops_tests.rs (1)

641-668: ⚡ Quick win

max_result_chars tests don’t exercise the runner path.

These tests currently reimplement truncation locally, so they can pass even if run_subagent truncation regresses. Please drive the assertion through run_subagent with def.max_result_chars = Some(...) / None to validate the real behavior end-to-end in this harness.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/harness/subagent_runner/ops_tests.rs` around lines 641 -
668, Replace the local truncation checks in the tests with end-to-end calls to
run_subagent and set the subagent definition's max_result_chars field (e.g.,
def.max_result_chars = Some(10) for the truncation test and def.max_result_chars
= None for the non‑truncation test), then assert on the actual output returned
by run_subagent (or the runner result structure) instead of duplicating the
truncate logic; update the tests max_result_chars_cap_is_enforced and
max_result_chars_not_applied_when_none to build the minimal SubagentDef/inputs
needed, call run_subagent, and validate that the returned output starts/ends or
equals the expected values.
src/openhuman/channels/runtime/dispatch.rs (1)

506-519: 💤 Low value

Consider extracting tests to a sibling dispatch_tests.rs module.

The file is ~1,425 lines, exceeding the ~500 line guideline. The scoping_tests and tests modules together account for ~400 lines. Moving them to a dedicated dispatch_tests.rs (or tests/dispatch_tests.rs) would keep the main module focused on operational code per the coding guidelines.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/channels/runtime/dispatch.rs` around lines 506 - 519, The
file-level tests (the #[cfg(test)] mod scoping_tests and the large tests module)
should be extracted into a sibling test module file named dispatch_tests.rs to
keep the main module under ~500 lines: create dispatch_tests.rs containing the
scoping_tests and tests modules, move any helper imports from the parent (adjust
use paths from super:: or crate:: as needed), ensure the moved tests reference
symbols like scoping_tests,
runtime_dispatch::dispatch_routes_through_agent_run_turn_bus_handler, and any
private items by either making those items pub(crate) or re-exporting needed
helpers, and then replace the in-file test modules with a single mod
dispatch_tests; declaration guarded by #[cfg(test)] so the tests compile and run
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent/agents/orchestrator/prompt.rs`:
- Around line 84-85: render_delegation_guide currently interpolates ci.toolkit
directly into "delegate_{ci.toolkit}", which can mismatch runtime-synthesized
tool names that are sanitized; change it to compute and use the same sanitizer
used at tool creation (e.g. let sanitized = sanitize_toolkit_slug(&ci.toolkit)
or call canonicalize_toolkit_name(&ci.toolkit)) and interpolate
"delegate_{sanitized}" instead of raw ci.toolkit, and import/use the existing
sanitizer function used elsewhere so prompt names always match runtime tool
names.

In `@src/openhuman/agent/harness/subagent_runner/ops.rs`:
- Around line 108-117: The truncation uses byte-based APIs causing potential
UTF-8 boundary panics and misreports character count: compute the original
character count with outcome.output.chars().count(), log that value instead of
outcome.output.len(), and when truncating use a character-safe byte index (e.g.,
find the byte offset with outcome.output.char_indices().nth(cap).map(|(i,_)|
i).unwrap_or(outcome.output.len())) then call
outcome.output.truncate(byte_offset) and append the "[...truncated]" suffix;
update the tracing::debug call to use the char count and the max_result_chars
(definition.max_result_chars) for accurate logging.

In `@src/openhuman/agent/prompts/mod.rs`:
- Around line 398-403: The current early return when ctx.tool_call_format ==
ToolCallFormat::Native discards any non-empty dispatcher_instructions; change
this logic to only suppress dispatcher_instructions for Native mode when
ctx.dispatcher_instructions (produced by
NativeToolDispatcher.prompt_instructions()) is empty, otherwise include/pass
through the dispatcher_instructions into the prompt; keep the token-saving
behavior for truly empty dispatcher_instructions but ensure
ctx.tool_call_format, ToolCallFormat::Native, and dispatcher_instructions are
checked and used accordingly.

---

Outside diff comments:
In `@src/openhuman/agent/agents/orchestrator/prompt.md`:
- Around line 55-63: Update the documentation text to stop referencing the
removed spawn_subagent API and instead instruct using the current
spawn_worker_thread tool and its equivalent option (e.g., dedicated_thread flag
or parameter name used by spawn_worker_thread); specifically replace mentions of
spawn_subagent and describe the behavior of spawn_worker_thread when invoked
with the dedicated_thread:true behavior (persisting the sub-agent run into a
fresh worker-labeled thread and returning a compact worker thread id + summary),
and ensure examples and guidance mention using spawn_worker_thread for
long-running multi-step tasks while keeping the default inline delegation for
everyday delegations.

In `@src/openhuman/agent/agents/orchestrator/prompt.rs`:
- Around line 38-42: Add stable-prefix debug/trace logging around the
delegation-guide rendering branches: when calling
render_delegation_guide(ctx.connected_integrations) and before/after the
trim/append branch, emit a trace/debug log that includes a stable prefix (e.g.
"delegation-guide:") plus ctx.connected_integrations count and whether the
section will be emitted or omitted; do the same for the other similar block (the
branch covering the section at lines referenced in the review) so there is a
clear log on entry, decision (emitted/omitted) and resulting string
length/state. Reference the render_delegation_guide function and
ctx.connected_integrations in the log messages and use the project logging crate
(log/tracing) at debug or trace level per guidelines.

In `@src/openhuman/channels/runtime/dispatch.rs`:
- Around line 441-449: fetch_connected_integrations at dispatch.rs (used when
computing extra_tools) currently awaits without a timeout and can block routing;
wrap the call to fetch_connected_integrations(&config).await in a
tokio::time::timeout (3s, same as build_connection_state_block) and if it times
out or errors, log a debug/warn including channel and target_agent (same fields
used in the tracing::debug) and fall back to an empty Vec (or the previous safe
default) before calling
orchestrator_tools::collect_orchestrator_tools(definition, registry,
&connected); ensure you handle both Ok(result) and Err(timeout) paths so
extra_tools is always computed promptly.

In `@src/openhuman/tools/orchestrator_tools.rs`:
- Around line 126-159: The loop that creates delegate_* tools (building
tool_name via sanitise_slug and pushing SkillDelegationTool instances) currently
includes integrations regardless of their connection status; modify the logic so
only actually connected integrations produce delegation tools—either filter
connected_integrations beforehand or early-return/continue in the loop when
integration.connected is false, ensuring SkillDelegationTool entries are created
only for integrations where integration.connected == true so preflight and
prompt guidance remain accurate.

---

Nitpick comments:
In `@src/openhuman/agent/harness/subagent_runner/ops_tests.rs`:
- Around line 641-668: Replace the local truncation checks in the tests with
end-to-end calls to run_subagent and set the subagent definition's
max_result_chars field (e.g., def.max_result_chars = Some(10) for the truncation
test and def.max_result_chars = None for the non‑truncation test), then assert
on the actual output returned by run_subagent (or the runner result structure)
instead of duplicating the truncate logic; update the tests
max_result_chars_cap_is_enforced and max_result_chars_not_applied_when_none to
build the minimal SubagentDef/inputs needed, call run_subagent, and validate
that the returned output starts/ends or equals the expected values.

In `@src/openhuman/channels/runtime/dispatch.rs`:
- Around line 506-519: The file-level tests (the #[cfg(test)] mod scoping_tests
and the large tests module) should be extracted into a sibling test module file
named dispatch_tests.rs to keep the main module under ~500 lines: create
dispatch_tests.rs containing the scoping_tests and tests modules, move any
helper imports from the parent (adjust use paths from super:: or crate:: as
needed), ensure the moved tests reference symbols like scoping_tests,
runtime_dispatch::dispatch_routes_through_agent_run_turn_bus_handler, and any
private items by either making those items pub(crate) or re-exporting needed
helpers, and then replace the in-file test modules with a single mod
dispatch_tests; declaration guarded by #[cfg(test)] so the tests compile and run
unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aca83efd-844a-4c0c-8e07-8c280c801a6f

📥 Commits

Reviewing files that changed from the base of the PR and between abec992 and a1ac5b3.

📒 Files selected for processing (25)
  • docs/DELEGATION_POLICY.md
  • src/openhuman/agent/agents/code_executor/agent.toml
  • src/openhuman/agent/agents/orchestrator/agent.toml
  • src/openhuman/agent/agents/orchestrator/prompt.md
  • src/openhuman/agent/agents/orchestrator/prompt.rs
  • src/openhuman/agent/agents/planner/agent.toml
  • src/openhuman/agent/agents/researcher/agent.toml
  • src/openhuman/agent/harness/definition.rs
  • src/openhuman/agent/harness/definition_tests.rs
  • src/openhuman/agent/harness/fork_context.rs
  • src/openhuman/agent/harness/payload_summarizer.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/subagent_runner/ops.rs
  • src/openhuman/agent/harness/subagent_runner/ops_tests.rs
  • src/openhuman/agent/prompts/mod.rs
  • src/openhuman/agent/prompts/mod_tests.rs
  • src/openhuman/agent/triage/escalation.rs
  • src/openhuman/channels/runtime/dispatch.rs
  • src/openhuman/config/schema/context.rs
  • src/openhuman/tools/impl/agent/spawn_worker_thread.rs
  • src/openhuman/tools/impl/memory/tree/mod.rs
  • src/openhuman/tools/ops.rs
  • src/openhuman/tools/orchestrator_tools.rs
  • tests/agent_harness_public.rs
  • tests/calendar_grounding_e2e.rs

Comment thread src/openhuman/agent/agents/orchestrator/prompt.rs Outdated
Comment thread src/openhuman/agent/harness/subagent_runner/ops.rs
Comment thread src/openhuman/agent/prompts/mod.rs
Copy link
Copy Markdown
Member

@senamakel senamakel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve coderabbit issues and checks

M3gA-Mind added 6 commits May 7, 2026 11:49
…urface changes

spawn_subagent was removed in tinyhumansai#1141; spawn_worker_thread and memory_tree
are the new entries the test should assert on. Pins the new contract so
future TOML edits can't silently revert it.
String::len() counts bytes; String::truncate(n) panics if n lands in the
middle of a multi-byte UTF-8 sequence. Switch to chars().count() for the
guard and char_indices().nth(cap) to find the safe byte offset before
truncating. Also fix the tracing::debug log field — original_chars now
correctly counts characters, not bytes.

Fixes @coderabbitai comment on ops.rs:117.
… tools

render_delegation_guide was printing raw ci.toolkit as the delegate_*
tool name, but collect_orchestrator_tools sanitises toolkit slugs via
sanitise_slug() first. A quirky slug (e.g. "Google Calendar" with a
space) would print a tool name in the prompt that never exists in the
function-calling schema. Fix: export sanitise_slug as pub(crate) and use
it in render_delegation_guide so prompt and schema always agree.

Also:
- Skip unconnected integrations in collect_orchestrator_tools (previously
  delegate_* tools were synthesised for disconnected toolkits, whose
  pre-flight check would immediately reject any call).
- Add tracing::debug log lines around the delegation-guide rendering
  branches (connected count, emit/omit decision, section size).
- Update prompt.md to remove stale spawn_subagent reference in "Dedicated
  worker threads" section — the tool was removed in tinyhumansai#1141.
- Update prompt.rs module docstring accordingly.

Addresses @coderabbitai comments on prompt.rs:85, orchestrator_tools.rs:126-159,
prompt.md:55-63, and prompt.rs:38-42.
…rmat

The ToolsSection early-return on ToolCallFormat::Native discarded any
non-empty dispatcher_instructions. NativeToolDispatcher::prompt_instructions()
returns a non-empty "## Tool Use Protocol" block that models need in order
to understand they should emit native tool calls rather than narrate intent.

New behaviour: if dispatcher_instructions is non-empty for Native mode,
return it directly (the tool catalogue is still omitted to save tokens);
if empty, return String::new() as before.

Adds a test to pin the new contract.

Addresses @coderabbitai comment on prompts/mod.rs:398-403.
…ing path

build_connection_state_block already wraps its fetch in a 3-second
timeout to prevent Composio API latency from blocking turn processing,
but the fetch at the delegation-tools expansion site (line ~442) had no
such protection. Slow or unresponsive Composio API could block agent
routing indefinitely.

Wrap with the same 3s tokio::time::timeout and fall back to Vec::new()
on timeout, matching the pattern used elsewhere in this file.

Addresses @coderabbitai comment on dispatch.rs:441-449.
…ed-integration tests

- ops_tests.rs: add max_result_chars_cap_is_char_safe_for_multibyte to
  prove truncation at a multi-byte codepoint boundary does not panic;
  update existing tests to use the char-safe logic matching ops.rs.
- orchestrator_tools.rs: add unconnected_integrations_are_skipped to pin
  the new connected-only filtering contract.

Addresses @coderabbitai nitpick on ops_tests.rs:641-668.
@M3gA-Mind
Copy link
Copy Markdown
Contributor Author

Yes..Working on those itself.

Copy link
Copy Markdown
Contributor Author

@M3gA-Mind M3gA-Mind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr-manager: all actionable CodeRabbit comments addressed in 5 commits (b47a984..97e5c91).

Deferred (low value — Standards): @coderabbitai suggested extracting the #[cfg(test)] modules from src/openhuman/channels/runtime/dispatch.rs into a sibling dispatch_tests.rs to keep the file under the ~500-line guideline. The file is currently ~1,425 lines. This is a valid long-term concern but is a pure refactor with no behavioural change and no CI impact. The right moment to do this split is when the file is next substantially modified. Deciding factor: do it then, not now.

@M3gA-Mind M3gA-Mind requested a review from senamakel May 7, 2026 06:26
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/openhuman/agent/harness/subagent_runner/ops_tests.rs (1)

640-703: ⚡ Quick win

Split the new truncation tests into a dedicated test module/file.

This test file now exceeds ~500 lines and keeps growing. Moving max_result_chars_* cases into a sibling test module will keep this suite maintainable.

As per coding guidelines: "Keep source files to ≤ ~500 lines per file; split modules when growing larger."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/harness/subagent_runner/ops_tests.rs` around lines 640 -
703, Move the three truncation tests (max_result_chars_cap_is_enforced,
max_result_chars_cap_is_char_safe_for_multibyte,
max_result_chars_not_applied_when_none) out of the large ops_tests.rs into a new
dedicated test file (e.g., create tests/ops_truncation.rs) and delete them from
the original file; copy the test functions verbatim into the new file as
standalone #[test] functions so they run as integration tests (no extra module
imports needed), ensuring the behavior and names remain unchanged and the
original ops_tests.rs file stays under ~500 lines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent/agents/orchestrator/prompt.md`:
- Line 33: The prompt contains conflicting delegation terminology: one spot
recommends using delegate_{toolkit} for external integrations while the decision
tree still references integrations_agent; update the decision tree text and any
examples in prompt.md to consistently use the delegate_{toolkit} pattern
(replace mentions of integrations_agent with delegate_{toolkit}), and ensure
related lines also clarify using spawn_worker_thread for long-running tasks and
delegate_archivist for memory writes so all delegation names
(delegate_researcher, delegate_run_code, delegate_plan, delegate_critic,
delegate_archivist, delegate_{toolkit}, spawn_worker_thread) are used
consistently in the decision flow.

In `@src/openhuman/agent/harness/subagent_runner/ops_tests.rs`:
- Around line 641-702: Replace the duplicated truncation logic in these tests by
exercising the real runner path: call run_subagent with a Definition where
def.max_result_chars is set (for the cap tests) or None (for the no-cap test),
mock the provider response to return the long/multibyte strings used in the
current tests, then assert on the returned outcome.output (not a
locally-truncated string) that it is truncated to the expected character count,
ends with "[...truncated]" and handles multibyte chars safely; use the existing
test names but invoke run_subagent and check outcome.output to ensure ops.rs’s
truncation logic is actually exercised.

---

Nitpick comments:
In `@src/openhuman/agent/harness/subagent_runner/ops_tests.rs`:
- Around line 640-703: Move the three truncation tests
(max_result_chars_cap_is_enforced,
max_result_chars_cap_is_char_safe_for_multibyte,
max_result_chars_not_applied_when_none) out of the large ops_tests.rs into a new
dedicated test file (e.g., create tests/ops_truncation.rs) and delete them from
the original file; copy the test functions verbatim into the new file as
standalone #[test] functions so they run as integration tests (no extra module
imports needed), ensuring the behavior and names remain unchanged and the
original ops_tests.rs file stays under ~500 lines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9ad2b5e6-49e8-49ef-9f2a-55aac9da848b

📥 Commits

Reviewing files that changed from the base of the PR and between a1ac5b3 and 97e5c91.

📒 Files selected for processing (9)
  • src/openhuman/agent/agents/loader.rs
  • src/openhuman/agent/agents/orchestrator/prompt.md
  • src/openhuman/agent/agents/orchestrator/prompt.rs
  • src/openhuman/agent/harness/subagent_runner/ops.rs
  • src/openhuman/agent/harness/subagent_runner/ops_tests.rs
  • src/openhuman/agent/prompts/mod.rs
  • src/openhuman/agent/prompts/mod_tests.rs
  • src/openhuman/channels/runtime/dispatch.rs
  • src/openhuman/tools/orchestrator_tools.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/openhuman/tools/orchestrator_tools.rs
  • src/openhuman/agent/harness/subagent_runner/ops.rs
  • src/openhuman/channels/runtime/dispatch.rs

Comment thread src/openhuman/agent/agents/orchestrator/prompt.md
Comment thread src/openhuman/agent/harness/subagent_runner/ops_tests.rs Outdated
…ed tool

tinyhumansai#1141 consolidated 6 individual memory_tree_* tools into a single
memory_tree tool with mode dispatch. The integration test in
tests/agent_retrieval_e2e.rs still asserted all 6 old names existed in
the orchestrator TOML, causing CI to fail.

Update the test to:
- Assert memory_tree IS present (the new consolidated tool)
- Assert none of the old individual names are present (pinning removal)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/agent_retrieval_e2e.rs (1)

94-111: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Strengthen the memory_tree assertion to check exact tool entries

Line 95 currently does a broad substring match, so this can pass even if memory_tree appears only in comments/text and not as a real tool entry. Prefer exact entry checks in the TOML list.

Proposed fix
     let toml = include_str!("../src/openhuman/agent/agents/orchestrator/agent.toml");
-    assert!(
-        toml.contains("memory_tree"),
-        "orchestrator agent.toml must list 'memory_tree' so the LLM can call it"
-    );
+    let has_memory_tree_entry = toml
+        .lines()
+        .map(str::trim)
+        .any(|line| line == "\"memory_tree\"" || line == "\"memory_tree\",");
+    assert!(
+        has_memory_tree_entry,
+        "orchestrator agent.toml must list 'memory_tree' so the LLM can call it"
+    );
@@
     for old_name in [
         "memory_tree_search_entities",
         "memory_tree_query_topic",
         "memory_tree_query_source",
         "memory_tree_query_global",
         "memory_tree_drill_down",
         "memory_tree_fetch_leaves",
     ] {
+        let old_entry = format!("\"{old_name}\"");
+        let old_entry_comma = format!("\"{old_name}\",");
+        let old_tool_present = toml
+            .lines()
+            .map(str::trim)
+            .any(|line| line == old_entry || line == old_entry_comma);
         assert!(
-            !toml.contains(old_name),
+            !old_tool_present,
             "orchestrator agent.toml must NOT list '{old_name}' — removed in `#1141` (use 'memory_tree' with mode= dispatch)"
         );
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/agent_retrieval_e2e.rs` around lines 94 - 111, The current assert uses
toml.contains("memory_tree") which can match comments or other text; instead
parse or precisely match the tool table entry for the orchestrator to ensure a
real tool is declared. Update the test around the toml variable to either (a)
parse the TOML string and assert there is a tool entry with name ==
"memory_tree" (using the toml crate or a small parser), or (b) assert the exact
snippet that represents the tool entry exists (for example the table/array entry
that contains name = "memory_tree" and any required fields), keeping the
existing negative checks for the old_name array intact; reference the toml
variable and the loop over old_name in tests/agent_retrieval_e2e.rs when making
the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/agent_retrieval_e2e.rs`:
- Around line 94-111: The current assert uses toml.contains("memory_tree") which
can match comments or other text; instead parse or precisely match the tool
table entry for the orchestrator to ensure a real tool is declared. Update the
test around the toml variable to either (a) parse the TOML string and assert
there is a tool entry with name == "memory_tree" (using the toml crate or a
small parser), or (b) assert the exact snippet that represents the tool entry
exists (for example the table/array entry that contains name = "memory_tree" and
any required fields), keeping the existing negative checks for the old_name
array intact; reference the toml variable and the loop over old_name in
tests/agent_retrieval_e2e.rs when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8eba3497-f2bb-43d8-a5b9-c7faaef1f6f1

📥 Commits

Reviewing files that changed from the base of the PR and between 97e5c91 and da54bb7.

📒 Files selected for processing (1)
  • tests/agent_retrieval_e2e.rs

@senamakel senamakel requested a review from sanil-23 May 7, 2026 06:52
- orchestrator/prompt.md: replace `integrations_agent` references in
  the decision tree with the actual synthesised `delegate_{toolkit}` /
  `delegate_run_code` / `delegate_plan` etc. tool names so the prompt
  matches what the LLM sees in its function-calling schema
- tests/agent_retrieval_e2e.rs: strengthen memory_tree tool assertion to
  use exact line matching (`"memory_tree"` / `"memory_tree",`) rather
  than a substring match that could hit comments or prefixed names
- subagent_runner/ops_truncation_tests.rs: extract the three
  max_result_chars truncation tests into their own file, keeping
  ops_tests.rs under the 500-line guideline threshold
@senamakel
Copy link
Copy Markdown
Member

@sanil-23 pls review

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 7, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/openhuman/agent/harness/subagent_runner/ops_truncation_tests.rs (1)

7-67: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

These tests validate duplicated logic, not the production run_subagent truncation path.

Right now they can stay green even if src/openhuman/agent/harness/subagent_runner/ops.rs truncation regresses. Please drive these assertions through run_subagent (def.max_result_chars + provider response + outcome.output) instead of re-implementing truncation in-test.

Suggested direction
-#[test]
-fn max_result_chars_cap_is_enforced() {
-    // local truncation implementation...
-}
+#[tokio::test]
+async fn max_result_chars_cap_is_enforced() {
+    // Arrange parent/provider + def.max_result_chars = Some(10)
+    // Act: run_subagent(...)
+    // Assert: outcome.output == "hello worl\n[...truncated]"
+}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/harness/subagent_runner/ops_truncation_tests.rs` around
lines 7 - 67, Tests currently re-implement truncation logic locally instead of
exercising the production path; update them to call run_subagent and assert on
the returned Outcome (outcome.output) when def.max_result_chars is set and the
mocked provider response is long, so truncation happens in
src/openhuman/agent/harness/subagent_runner/ops.rs; specifically, replace the
inline truncation in the three tests with setups that (1) configure a
SubagentDef with max_result_chars = Some(cap), (2) mock the provider response to
return the long string, (3) invoke run_subagent, and (4) assert the
outcome.output length/content and that it ends with "[...truncated]" (also
include the case cap = None to ensure no truncation).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/openhuman/agent/harness/subagent_runner/ops_truncation_tests.rs`:
- Around line 7-67: Tests currently re-implement truncation logic locally
instead of exercising the production path; update them to call run_subagent and
assert on the returned Outcome (outcome.output) when def.max_result_chars is set
and the mocked provider response is long, so truncation happens in
src/openhuman/agent/harness/subagent_runner/ops.rs; specifically, replace the
inline truncation in the three tests with setups that (1) configure a
SubagentDef with max_result_chars = Some(cap), (2) mock the provider response to
return the long string, (3) invoke run_subagent, and (4) assert the
outcome.output length/content and that it ends with "[...truncated]" (also
include the case cap = None to ensure no truncation).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8363f936-7bb7-43af-a5b8-8b0ec016f509

📥 Commits

Reviewing files that changed from the base of the PR and between da54bb7 and f9943f7.

📒 Files selected for processing (5)
  • src/openhuman/agent/agents/orchestrator/prompt.md
  • src/openhuman/agent/harness/subagent_runner/ops.rs
  • src/openhuman/agent/harness/subagent_runner/ops_tests.rs
  • src/openhuman/agent/harness/subagent_runner/ops_truncation_tests.rs
  • tests/agent_retrieval_e2e.rs
✅ Files skipped from review due to trivial changes (1)
  • tests/agent_retrieval_e2e.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/agent/agents/orchestrator/prompt.md

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 7, 2026
- If complex multi-step decomposition is required, use `delegate_plan`.
- If code review is requested, use `delegate_critic`.
- If memory archiving or distillation is required, use `delegate_archivist`.
4. **After delegation**, summarise results clearly and concisely.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drift risk: archetype names in this prose can diverge from the synthesised delegate_* tools.

Step 3 here lists delegate_run_code, delegate_researcher, delegate_plan, delegate_critic, delegate_archivist as hardcoded markdown. The summary line below repeats them. Meanwhile the source of truth for which delegate_* tools actually exist is agents/orchestrator/agent.toml's subagents = [...] field, expanded into tool schemas by collect_orchestrator_tools.

If anyone later removes (say) archivist from subagents, the synthesised delegate_archivist tool disappears from the API tool schema, but this prompt still tells the LLM to "use delegate_archivist for memory writes." The model attempts to call a tool that doesn't exist.

This PR did the right thing for integrationsrender_delegation_guide derives the ## Connected Integrations block from runtime data, so prompt and schema cannot drift. But archetypes still go through hardcoded prose.

Two options:

  1. Make render_delegation_guide also iterate definition.subagents archetype entries and emit one line per archetype using each target's when_to_use. Then archetypes and integrations share one drift-free source — same guarantee this PR already gives integrations.
  2. Cheap fallback: add a startup test that asserts every delegate_<archetype> literal mentioned in prompt.md appears in the synthesised orchestrator tool set. Catches drift in CI without restructuring the prompt.

Either is acceptable; option 1 is the principled fix.

Only the integrations listed below are currently authorised. \
If the user asks about another service, tell them to connect \
it in **Skills** page before retrying.\n\n",
"## Connected Integrations\n\n\
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for you: now that each connected toolkit gets its own delegate_{toolkit} tool with auto-generated description, is the per-bullet ## Connected Integrations block still pulling its weight?

Each connected toolkit is described in two places after this PR:

  1. The delegate_{toolkit} tool's schema description in orchestrator_tools.rs:142-150 (sent to the API on every request)
  2. The bullet line this function renders into the system prompt

Both carry the toolkit name, the tool to call, and the description. The only unique signal this prompt block adds is the naming-pattern hint at the top (Delegate tasks for these services using the matching delegate_{toolkit} tool).

The rest of this PR's token wins (skipping render_tools() for native, dropping the "Available Sub-Agents" + "Direct Tools" tables) are explicitly motivated by don't repeat the schema in prose. The same logic applies here, but this PR keeps the prose listing.

Two coherent shapes — picking either is fine; picking both (this PR's current state) is the worst of both worlds:

Option A — Drop the per-toolkit list, keep just the naming-pattern hint.
Replace this entire function's output with one line in the rules section:

External integrations are exposed as delegate_{toolkit} tools — only currently-connected toolkits are present in your tool list.

Saves ~30–50 prompt tokens per connected integration on every cache miss. Loses nothing the schema doesn't already encode. Consistent with Wins 2 and 5 of this PR.

Option B — Drop the per-toolkit delegate_* tools, go back to one generic spawn_subagent-style tool with a toolkit parameter, and keep the prompt list.
Cheaper in tool-schema bytes (one schema instead of N), but loses the tool-selection-accuracy win that justifies Step 4 of this PR. Effectively reverses part of the refactor.

Which way did you intend? If A, this function shrinks to a single static line. If B, the synthesis logic in orchestrator_tools.rs simplifies.

# Conflicts:
#	src/openhuman/agent/agents/orchestrator/agent.toml
#	src/openhuman/tools/ops.rs
@senamakel senamakel merged commit 8d9a1ca into tinyhumansai:main May 9, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audit orchestrator harness efficiency and reduce unnecessary delegation/token waste

3 participants