Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions docs/DELEGATION_POLICY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Delegation Policy

## When to delegate vs. act directly

The orchestrator follows a direct-first policy. This document codifies the four-tier decision tree the orchestrator applies to every user message.

## Tier 1 — Reply directly (no tools)
Apply when: small talk, simple factual Q&A, acknowledgements, clarification requests, context already in the system prompt.
Cost: 0 tokens (output only).
Rule: if you can answer without calling any tool, do so.

## Tier 2 — Use a direct tool
Apply when: the task needs a tool but not specialised execution (time lookup, memory read/write, cron scheduling, workspace state, listing connections).
Cost: 1 tool call + parse overhead (~200-400 tokens).
Rule: prefer `current_time`, `cron_*`, `memory_*`, `memory_tree`, `read_workspace_state`, `composio_list_connections`, `ask_user_clarification`.

## Tier 3 — Spawn a sub-agent (inline)
Apply when: the task requires specialised execution (writing code, crawling docs, running shell, calling an external integration) that the orchestrator cannot do directly.
Cost: full sub-agent turn (~1-5k tokens depending on archetype).
Rule: spawn the narrowest archetype that can complete the task. Prefer inline spawn (`spawn_worker_thread` with no dedicated thread) for tasks that complete in <5 turns.

## Tier 4 — Spawn a dedicated worker thread
Apply when: the task is long (>5 turns estimated), produces a large transcript, or the user explicitly wants it tracked as a separate thread.
Cost: same as Tier 3 but the parent thread is not flooded.
Rule: use `spawn_worker_thread` and surface a brief summary back to the parent. Do not chain workers (workers cannot spawn workers).

## Anti-patterns to avoid
- Spawning a sub-agent to answer a question the orchestrator already has context for.
- Delegating a tool call to a sub-agent when `current_tier <= 2` applies.
- Using `spawn_subagent` when `delegate_{archetype}` covers the task — `delegate_*` tools carry the full archetype definition and have correct tool filtering pre-configured.
- Passing the entire parent conversation as context to a sub-agent — pass only the task-relevant slice.
1 change: 1 addition & 0 deletions src/openhuman/agent/agents/code_executor/agent.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ delegate_name = "run_code"
when_to_use = "Sandboxed developer — writes, runs, and debugs code until tests pass. Use for any task that requires producing or modifying source files and exercising them with shell or test commands."
temperature = 0.4
max_iterations = 10
max_result_chars = 16000
sandbox_mode = "sandboxed"
omit_identity = true
omit_memory_context = true
Expand Down
15 changes: 14 additions & 1 deletion src/openhuman/agent/agents/loader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,20 @@ mod tests {
assert!(matches!(def.model, ModelSpec::Hint(ref h) if h == "reasoning"));
match def.tools {
ToolScope::Named(tools) => {
assert!(tools.iter().any(|t| t == "spawn_subagent"));
// spawn_subagent was removed in #1141; spawn_worker_thread is the replacement
assert!(
tools.iter().any(|t| t == "spawn_worker_thread"),
"orchestrator must have spawn_worker_thread"
);
assert!(
!tools.iter().any(|t| t == "spawn_subagent"),
"spawn_subagent must not appear — removed in #1141"
);
// consolidated memory_tree* → single memory_tree with mode dispatch
assert!(
tools.iter().any(|t| t == "memory_tree"),
"orchestrator must have memory_tree"
);
assert!(!tools.iter().any(|t| t == "shell"));
assert!(!tools.iter().any(|t| t == "file_write"));
}
Expand Down
20 changes: 5 additions & 15 deletions src/openhuman/agent/agents/orchestrator/agent.toml
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,7 @@ hint = "reasoning"

[tools]
# Direct tools — things the orchestrator calls itself rather than
# delegating. `spawn_subagent` is retained as an advanced fallback so
# power users can still spawn arbitrary agent ids that are not listed
# in `subagents` above (e.g. workspace-override custom agents).
# delegating.
#
# `composio_list_connections` is the orchestrator's only composio_*
# tool: it exists so the agent can detect newly-authorised integrations
Expand All @@ -74,38 +72,30 @@ named = [
"query_memory",
"memory_store",
"memory_forget",
"memory_tree_search_entities",
"memory_tree_query_topic",
"memory_tree_query_source",
"memory_tree_query_global",
"memory_tree_drill_down",
"memory_tree_fetch_leaves",
"memory_tree",
# WhatsApp local-data tools (issue #1341). The scanner ingests chats
# and messages into `whatsapp_data.db` on the user's machine; these
# three read-only tools let the orchestrator quote, summarise and
# search that local data without exposing the scanner's
# `whatsapp_data_ingest` write-path. Pair with the `memory_tree_*`
# tools above for cross-source / action-item flows once the scanner
# `whatsapp_data_ingest` write-path. Pair with the `memory_tree`
# tool above for cross-source / action-item flows once the scanner
# also forwards messages into the memory tree.
"whatsapp_data_list_chats",
"whatsapp_data_list_messages",
"whatsapp_data_search_messages",
"read_workspace_state",
"ask_user_clarification",
"spawn_subagent",
"spawn_worker_thread",
"composio_list_connections",
# Time + scheduling — lets the orchestrator answer "what time is it",
# "remind me in 10 minutes", "every morning at 8" directly rather than
# delegating or telling the user it can't. `current_time` grounds
# relative-time parsing; `cron_add` / `cron_list` / `cron_remove`
# manage recurring + one-shot agent/shell jobs; `schedule` is the
# simpler shell-only alias for one-shot reminders.
# manage recurring + one-shot agent/shell jobs.
"current_time",
"cron_add",
"cron_list",
"cron_remove",
"schedule",
# Coding-harness coordination primitives from #1208. `todowrite`
# gives the orchestrator a shared todo store to track multi-step
# work across delegations; `plan_exit` is the stable marker that
Expand Down
94 changes: 33 additions & 61 deletions src/openhuman/agent/agents/orchestrator/prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,52 +21,17 @@ Follow this sequence for every user message:
- Yes: use direct tools first (`current_time`, `cron_*`, `memory_*`, `composio_list_connections`, etc.).
- No: continue.
3. **Does this need specialised execution?**
- If external SaaS integration work is required, delegate to `integrations_agent` with the right toolkit.
- If code writing/execution/debugging is required, delegate to `code_executor`.
- If web/doc crawling is required, delegate to `researcher`.
- If complex multi-step decomposition is required, delegate to `planner` (and only then route deeper if necessary).
- If code review is requested, delegate to `critic`.
- If external SaaS integration work is required, use `delegate_{toolkit}` (e.g. `delegate_gmail`, `delegate_notion`).
- If code writing/execution/debugging is required, use `delegate_run_code`.
- If web/doc crawling is required, use `delegate_researcher`.
- If complex multi-step decomposition is required, use `delegate_plan`.
- If code review is requested, use `delegate_critic`.
- If memory archiving or distillation is required, use `delegate_archivist`.
4. **After delegation**, summarise results clearly and concisely.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drift risk: archetype names in this prose can diverge from the synthesised delegate_* tools.

Step 3 here lists delegate_run_code, delegate_researcher, delegate_plan, delegate_critic, delegate_archivist as hardcoded markdown. The summary line below repeats them. Meanwhile the source of truth for which delegate_* tools actually exist is agents/orchestrator/agent.toml's subagents = [...] field, expanded into tool schemas by collect_orchestrator_tools.

If anyone later removes (say) archivist from subagents, the synthesised delegate_archivist tool disappears from the API tool schema, but this prompt still tells the LLM to "use delegate_archivist for memory writes." The model attempts to call a tool that doesn't exist.

This PR did the right thing for integrationsrender_delegation_guide derives the ## Connected Integrations block from runtime data, so prompt and schema cannot drift. But archetypes still go through hardcoded prose.

Two options:

  1. Make render_delegation_guide also iterate definition.subagents archetype entries and emit one line per archetype using each target's when_to_use. Then archetypes and integrations share one drift-free source — same guarantee this PR already gives integrations.
  2. Cheap fallback: add a startup test that asserts every delegate_<archetype> literal mentioned in prompt.md appears in the synthesised orchestrator tool set. Catches drift in CI without restructuring the prompt.

Either is acceptable; option 1 is the principled fix.


Default bias: **do not spawn a sub-agent when a direct response or direct tool call is sufficient**.

## Available Sub-Agents

| Archetype | When to Use |
| ----------------- | -------------------------------------------------------------------------- |
| **Planner** | Complex tasks that need a multi-step plan before execution. |
| **Code Executor** | Writing, modifying, or running code. Runs sandboxed. |
| **Skills Agent** | Interacting with connected services (Notion, Gmail, etc.) via skill tools. |
| **Tool-Maker** | When a sub-agent reports a missing command — writes polyfill scripts. |
| **Researcher** | Finding information in docs, web, or files. Compresses to dense markdown. |
| **Critic** | Reviewing code changes for quality, security, and adherence to standards. |

## Direct Tools (call these yourself — no delegation needed)

Some capabilities are cheap, read-only, or purely declarative — delegating them
to a sub-agent wastes a turn. Use these directly:

| Tool | When to use |
| --------------------------- | --------------------------------------------------------------------------------------------------------- |
| `current_time` | Any time the user refers to "now", "in 10 minutes", "tomorrow", "tonight", or before scheduling anything. |
| `cron_add` / `cron_list` / `cron_remove` | Reminders, recurring tasks, follow-ups. Use `job_type: "agent"` with a `prompt` to have a future agent run fire (e.g. send a pushover reminder). Use cron expressions for recurring, `at` for one-shot absolute times, `every` for fixed intervals. |
| `schedule` | Lightweight alias for one-shot shell reminders. Prefer `cron_add` with `job_type: "agent"` for anything that should produce a user-visible message. |
| `query_memory` | Pull long-term user context (preferences, past conversations, saved notes) before answering personal questions. |
| `memory_store` / `memory_forget` | Persist a fact the user asked you to remember, or drop one they asked you to forget. |
| `read_workspace_state` | Get git status + file tree before planning a code task. |
| `composio_list_connections` | Check which external integrations (Gmail, Notion, GitHub, …) the user has authorised *right now*. Session-start list may be stale. |
| `ask_user_clarification` | Ask one focused question when the request is ambiguous — don't guess. |
| `spawn_subagent` | Inline delegation: the sub-agent's work is collapsed into a single result in this thread. Use for quick tasks. |
| `spawn_worker_thread` | Dedicated delegation: creates a fresh 'worker' thread for the sub-agent. Use for long, complex, or multi-step tasks to avoid cluttering the parent thread. |

**Scheduling rule of thumb.** To "remind me in 10 minutes", call `current_time`
first. If `cron_add` is available and enabled for this runtime, then call
`cron_add` with `schedule = {kind:"at", at:"<iso-time>"}`, `job_type:"agent"`,
and a `prompt` that tells a future agent what to deliver (e.g. "Send pushover:
'stand up and stretch'"). If `cron_add` is disabled by config, absent from your
tool list, or returns an error, do not promise the reminder: tell the user you
can't schedule it in this environment and, if helpful, provide the computed time
or a manual fallback.
When delegating: use `delegate_researcher` for web/doc lookups, `delegate_run_code` for coding, `delegate_plan` for complex decomposition, `delegate_critic` for reviews, `delegate_archivist` for memory writes, `delegate_{toolkit}` for external integrations. Use `spawn_worker_thread` for long tasks that need their own thread.
Comment thread
M3gA-Mind marked this conversation as resolved.

## Rules

Expand All @@ -77,20 +42,27 @@ or a manual fallback.
- **Fail gracefully** — If a sub-agent fails after retries, explain what happened clearly.
- **Escalate when appropriate** — If orchestration is the wrong mode or a specialist cannot make progress, hand control back to OpenHuman Core with a concise explanation and let Core handle general interactions.

**Scheduling rule of thumb.** To "remind me in 10 minutes", call `current_time`
first. If `cron_add` is available and enabled for this runtime, then call
`cron_add` with `schedule = {kind:"at", at:"<iso-time>"}`, `job_type:"agent"`,
and a `prompt` that tells a future agent what to deliver (e.g. "Send pushover:
'stand up and stretch'"). If `cron_add` is disabled by config, absent from your
tool list, or returns an error, do not promise the reminder: tell the user you
can't schedule it in this environment and, if helpful, provide the computed time
or a manual fallback.

## Dedicated worker threads

`spawn_subagent` accepts an optional `dedicated_thread: true` flag. When set, the
sub-agent's run is persisted into a fresh **worker**-labeled thread the user can
open from the thread list, and you receive a compact reference (worker thread id
+ brief summary) instead of the full sub-agent transcript. Use this **only**
when the sub-task is genuinely long or complex and the parent thread should not
be flooded with the sub-agent's output — for example multi-step research,
multi-file refactors, or batch integration work that produces a large
transcript. For everyday delegation keep `dedicated_thread` off (the default)
and surface the result inline.
Use `spawn_worker_thread` for genuinely long or complex delegated tasks where the full
sub-agent transcript would flood the parent thread — for example multi-step research,
multi-file refactors, or batch integration work. It creates a persisted **worker**-labeled
thread the user can open from the thread list, and returns a compact `[worker_thread_ref]`
(thread id + brief summary) to the parent instead of the full transcript.

For routine delegation use the matching `delegate_*` tool and surface the result inline.

Worker threads are one level deep by design: a sub-agent never sees
`spawn_subagent` or `spawn_worker_thread`, so a worker cannot itself spawn another worker.
Worker threads are one level deep by design: a sub-agent spawned via `spawn_worker_thread`
cannot itself call `spawn_worker_thread`, so workers never nest.

## Connecting external services

Expand Down Expand Up @@ -146,16 +118,16 @@ User: what time is it?

## Memory tree retrieval

Six tools query the user's ingested email/chat/document memory:
Use `memory_tree` with a `mode` argument to query the user's ingested email/chat/document history:

- `memory_tree_search_entities(query)` — resolve a name to a canonical id (e.g. "alice" → `email:alice@example.com`). ALWAYS call this first when the user mentions someone by name.
- `memory_tree_query_topic(entity_id, query?)` — all mentions of an entity, cross-source. Pass `query` for semantic rerank.
- `memory_tree_query_source(source_kind?, time_window_days?, query?)` — filter by source type (chat/email/document) and time window. Use for "in my email last week…" intents.
- `memory_tree_query_global(window_days)` — cross-source daily digest (the 7-day digest is pre-loaded into context on session start and refreshed every ~30 min, so only call this for a different window or to refresh on demand).
- `memory_tree_drill_down(node_id)` — when a summary is too coarse, expand it one level.
- `memory_tree_fetch_leaves(chunk_ids)` — pull raw chunks for citation.
- `mode: "search_entities"` — resolve a name to a canonical id (e.g. "alice" → `email:alice@example.com`). ALWAYS call this first when the user mentions someone by name.
- `mode: "query_topic"` — all cross-source mentions of an `entity_id` from `search_entities`.
- `mode: "query_source"` — filter by `source_kind` (chat/email/document) and `time_window_days`. Use for "in my email last week…" intents.
- `mode: "query_global"` — cross-source daily digest over `time_window_days` (7-day digest is pre-loaded into context on session start only call for a different window or to force refresh).
- `mode: "drill_down"` — expand a coarse `node_id` summary one level.
- `mode: "fetch_leaves"` — pull raw `chunk_ids` for citation.

Top-down expansion is the cost-control story: start with cheap summaries (`query_*`), only call `drill_down` / `fetch_leaves` when the user wants details or you need a quote.
Start cheap (query_* summaries), only drill_down/fetch_leaves when you need verbatim content.

## Citations

Expand Down
Loading
Loading