Skip to content

feat(harness): harden agent loop for error-free worker sessions#222

Merged
andersonleal merged 3 commits into
mainfrom
feat/harness-agent-hardening
Jun 4, 2026
Merged

feat(harness): harden agent loop for error-free worker sessions#222
andersonleal merged 3 commits into
mainfrom
feat/harness-agent-hardening

Conversation

@andersonleal

@andersonleal andersonleal commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

Hardens the iii agent harness so agent-built worker sessions run error-free, plus the iii-directory skills index the harness points agents at. Driven by analyzing real failing agent sessions and validated end-to-end by re-running build tasks (todo worker, pubsub/queue/stream pipeline, generic KV store) on Claude Sonnet 4.5 and Haiku 4.5.

feat(harness) — agent loop hardening

System prompt (turn-orchestrator/system-prompt.ts)

  • Load a worker's skill BEFORE building (not a last resort): engine = per-call contract, skill = approach.
  • Forbid importing foreign-ecosystem patterns (standalone servers, package managers, framework conventions) without checking the worker's skill first.
  • payload must be a JSON object, never a stringified JSON.
  • Fetch the contract via engine::functions::info before any call; a list description is a hint, not the contract.
  • Read errors and change something before the next call; never resend an identical failed call; stop looping on repeating infra/timeout errors.

agent-trigger (turn-orchestrator/agent-trigger.ts)

  • coercePayload parses stringified-JSON payloads.
  • Decode failures are classified as invalid_arguments (not gate_unavailable), and the target's contract is auto-attached to the error so the next attempt is correctly shaped.
  • extractFirstJsonObject + extractStructuredHandlerError surface structured {code, fix} handler errors (e.g. S211) instead of burying them in gate_unavailable.
  • dispatchWithHook(targetCall) sends the target its untouched args so a payload function_id (e.g. for engine::functions::info) is never clobbered by the routing envelope; the approval hook still sees the envelope.

ports (function-execute/ports.ts) — dispatch the target with clean args while the hook sees the routing envelope.
config (config.ts) — default system_default_skills to [].

feat(iii-directory)skills::index

Per-worker overview of every installed worker's skills, with a character budget that always lists every worker (truncating only optional descriptions) and re-fetches worker::list per call for live state. The harness points agents here to discover what skills exist before building.

Test plan

  • Harness unit/regression suite green (1033 tests), incl. new tests: coercePayload, invalid_arguments + auto-attach contract, fetchContract, extractFirstJsonObject, clean-args dispatch (function_id clobber), and the system-prompt rules.
  • End-to-end validated by driving real agent sessions (console path) on Sonnet 4.5 + Haiku 4.5 building a todo worker, a pubsub/queue/stream pipeline, and a generic KV store — each independently verified live (functions registered, HTTP endpoints serving real data), not via agent self-report.
  • CI green.

Summary by CodeRabbit

  • New Features

    • Automatic decoding of JSON payloads sent as encoded strings
    • Fresh worker discovery so newly registered skills appear immediately
  • Bug Fixes

    • Routing envelope handling fixed so calls keep intended arguments
    • Argument-decode failures now classified with clearer invalid-arguments hints
    • Orchestrator config default for system_default_skills now empty
  • Documentation

    • Expanded agent-trigger guidance emphasizing contract checks and error handling
  • Tests

    • Broadened test coverage for payload decoding, error extraction, and discovery behaviors

System prompt: load a worker's skill before building (not as a last
resort), forbid importing foreign-ecosystem patterns without checking the
skill, require payload as a JSON object, fetch the contract before
calling, and recover from errors instead of blind identical retries.

agent-trigger: coerce stringified payloads; classify decode failures as
invalid_arguments and auto-attach the target's contract; surface
structured {code,fix} handler errors via a brace-balanced extractor; and
send the target its untouched args so a payload function_id is never
clobbered by the routing envelope.

ports: trigger the target with clean args while the hook still sees the
routing envelope. config: default system_default_skills to [].

Adds regression tests across agent-trigger, system-prompt, and config.
Render a per-worker overview of every installed worker's skills, with a
character budget that always lists every worker and truncates only the
optional descriptions. Re-fetch worker::list per call so the index
reflects live state. The harness system prompt points agents here to
discover what skills exist before building.
@vercel

vercel Bot commented Jun 3, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
workers Ready Ready Preview, Comment Jun 3, 2026 8:58pm

Request Review

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fd25c5e8-a2b6-4c4d-8cca-3ec6092a48ba

📥 Commits

Reviewing files that changed from the base of the PR and between b7656ec and bc59d53.

📒 Files selected for processing (3)
  • harness/src/turn-orchestrator/agent-trigger.ts
  • harness/tests/turn-orchestrator/agent-trigger.test.ts
  • harness/tests/turn-orchestrator/system-prompt.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • harness/tests/turn-orchestrator/agent-trigger.test.ts
  • harness/tests/turn-orchestrator/system-prompt.test.ts

📝 Walkthrough

Walkthrough

Coerce JSON-encoded object/array payloads, extract embedded structured handler errors, classify argument-decode failures as invalid_arguments with optional contract enrichment, separate routing envelope from the real target call, expand system prompt guidance, change orchestrator default skills to empty, and add fresh resolution plus per-worker-budget rendering for skills index.

Changes

Agent Trigger & Skills Discovery

Layer / File(s) Summary
Payload Coercion & Unwrap
harness/src/turn-orchestrator/agent-trigger.ts, harness/tests/turn-orchestrator/agent-trigger.test.ts
coercePayload decodes JSON-encoded string payloads (objects/arrays) into native objects/arrays; unwrapAgentTrigger applies coercion to arguments and arguments.payload. Tool description updated and tests added for double-encoded inputs, whitespace, identity, and malformed inputs.
Structured Error Extraction & Helpers
harness/src/turn-orchestrator/agent-trigger.ts, harness/tests/turn-orchestrator/agent-trigger.test.ts
extractFirstJsonObject finds the first brace-balanced JSON object (string/escape-aware); extractStructuredHandlerError tries direct parse then embedded-object fallback to recover {code,message} envelopes. Tests cover nested objects, escaped quotes, trailing junk, and null cases.
Contract Fetching (engine::functions::info)
harness/src/turn-orchestrator/agent-trigger.ts, harness/tests/turn-orchestrator/agent-trigger.test.ts
fetchContract best-effort-queries the engine for a function contract, returning details or raw value, or null on empty id or error. Tests exercise structured/raw responses and loop/self-introspection avoidance.
Argument Decode Classification & Enrichment
harness/src/turn-orchestrator/agent-trigger.ts, harness/tests/turn-orchestrator/agent-trigger.test.ts
triggerFunctionCall treats argument-deserialization failures as invalid_arguments (not gate_unavailable), optionally attaches fetched contract details, and returns enriched messages; tests validate contract attachment and fallback behavior.
Routing Envelope vs Target Call
harness/src/turn-orchestrator/agent-trigger.ts, harness/src/turn-orchestrator/function-execute/ports.ts, harness/tests/turn-orchestrator/agent-trigger.test.ts
dispatchWithHook accepts a separate targetCall so hooks/policies inspect a routing envelope while the actual invocation uses the original target call; ports pass both wrapped and original calls. Regression test ensures target args are not overwritten by routing metadata.
Orchestrator Config Default
harness/src/turn-orchestrator/config.ts, harness/tests/turn-orchestrator/config.test.ts
system_default_skills default changed from ['iii://iii-directory/index'] to []; test updated to expect an empty fallback.
System Prompt Expansion
harness/src/turn-orchestrator/system-prompt.ts, harness/tests/turn-orchestrator/system-prompt.test.ts
IDENTITY_PREAMBLE rewritten to mandate pre-call contract discovery via engine::functions::info, require JSON object payloads (not JSON-encoded strings), describe discovery hierarchy and retry/error rules, and add extensive test coverage asserting these constraints.
Skills Cache Fresh Resolution
iii-directory/src/functions/skills.rs
RegisteredWorkersCache::get_fresh added to bypass TTL; resolve_visible_skills gains fresh flag. list/get use cached path, index uses fresh path to surface newly registered workers immediately.
Skills Index Rendering & Budget
iii-directory/src/functions/skills.rs, iii-directory/src/functions/skills.rs tests
render_index_markdown always includes all worker headings and get pointers; per-worker descriptions are budget-capped and omitted descriptions are noted with pointer to directory::skills::get. Tests updated to reflect new behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • sergiofilhowz

"A rabbit's note on payloads light and airy,
We parse the strings that once felt wary.
When errors hide in human prose, we find the key,
Fresh skills appear — index full and free.
Hops and tests, kept tidy and merry."

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main objective of hardening the agent loop for error-free worker sessions, which aligns with the PR's primary purpose.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/harness-agent-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

skill-check — worker

0 verified, 14 skipped (no docs/).

Layer Result
structure
vale
ai
render

Four for four. Nicely done.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@iii-directory/src/functions/skills.rs`:
- Around line 1196-1203: The code is wrongly comparing out.len() (which includes
headings and “Full reference” pointers) against INDEX_CHAR_BUDGET, so
headings/pointers eat into the per-worker description budget; change the logic
to track only description characters when enforcing INDEX_CHAR_BUDGET. Introduce
a counter (e.g., descriptions_len or desc_chars_used) that accumulates only
description text lengths (use the same desc_cost calculation) and replace the
check if out.len() + desc_cost <= INDEX_CHAR_BUDGET with if descriptions_len +
desc_cost <= INDEX_CHAR_BUDGET, increment descriptions_len when you push a
description, and leave omitted_descriptions behavior unchanged; reference
worker.description, INDEX_CHAR_BUDGET, out and omitted_descriptions to locate
the code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9449cab5-f411-4c5e-b8f1-5c91d77ab89f

📥 Commits

Reviewing files that changed from the base of the PR and between 3440a04 and b7656ec.

📒 Files selected for processing (8)
  • harness/src/turn-orchestrator/agent-trigger.ts
  • harness/src/turn-orchestrator/config.ts
  • harness/src/turn-orchestrator/function-execute/ports.ts
  • harness/src/turn-orchestrator/system-prompt.ts
  • harness/tests/turn-orchestrator/agent-trigger.test.ts
  • harness/tests/turn-orchestrator/config.test.ts
  • harness/tests/turn-orchestrator/system-prompt.test.ts
  • iii-directory/src/functions/skills.rs

Comment on lines 1196 to +1203
if !worker.description.is_empty() {
block.push('\n');
block.push_str(&format!("{}\n", worker.description));
let desc_cost = "\n".len() + worker.description.len() + "\n".len();
if out.len() + desc_cost <= INDEX_CHAR_BUDGET {
out.push('\n');
out.push_str(&format!("{}\n", worker.description));
} else {
omitted_descriptions = true;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't charge headings and get pointers against the description budget.

INDEX_CHAR_BUDGET is described as the cap for per-worker descriptions, but this branch checks out.len(), so previously emitted headings and Full reference lines consume that budget too. With many workers and short descriptions, later descriptions get dropped much earlier than intended.

Proposed fix
-    let mut omitted_descriptions = false;
+    let mut omitted_descriptions = false;
+    let mut description_chars_used = 0;
@@
         if !worker.description.is_empty() {
             let desc_cost = "\n".len() + worker.description.len() + "\n".len();
-            if out.len() + desc_cost <= INDEX_CHAR_BUDGET {
+            if description_chars_used + desc_cost <= INDEX_CHAR_BUDGET {
                 out.push('\n');
                 out.push_str(&format!("{}\n", worker.description));
+                description_chars_used += desc_cost;
             } else {
                 omitted_descriptions = true;
             }
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@iii-directory/src/functions/skills.rs` around lines 1196 - 1203, The code is
wrongly comparing out.len() (which includes headings and “Full reference”
pointers) against INDEX_CHAR_BUDGET, so headings/pointers eat into the
per-worker description budget; change the logic to track only description
characters when enforcing INDEX_CHAR_BUDGET. Introduce a counter (e.g.,
descriptions_len or desc_chars_used) that accumulates only description text
lengths (use the same desc_cost calculation) and replace the check if out.len()
+ desc_cost <= INDEX_CHAR_BUDGET with if descriptions_len + desc_cost <=
INDEX_CHAR_BUDGET, increment descriptions_len when you push a description, and
leave omitted_descriptions behavior unchanged; reference worker.description,
INDEX_CHAR_BUDGET, out and omitted_descriptions to locate the code.

The CI `biome ci` step (Biome 2.4.10) flagged format diffs on the changed
files; the local Biome was 1.9.4 so they weren't caught pre-push. Format-only,
no logic change. `biome ci harness` now passes; tests green.
@andersonleal andersonleal merged commit 84871c4 into main Jun 4, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants