feat(harness): harden agent loop for error-free worker sessions by andersonleal · Pull Request #222 · iii-hq/workers

andersonleal · 2026-06-03T20:48:56Z

Summary

Hardens the iii agent harness so agent-built worker sessions run error-free, plus the iii-directory skills index the harness points agents at. Driven by analyzing real failing agent sessions and validated end-to-end by re-running build tasks (todo worker, pubsub/queue/stream pipeline, generic KV store) on Claude Sonnet 4.5 and Haiku 4.5.

`feat(harness)` — agent loop hardening

System prompt (turn-orchestrator/system-prompt.ts)

Load a worker's skill BEFORE building (not a last resort): engine = per-call contract, skill = approach.
Forbid importing foreign-ecosystem patterns (standalone servers, package managers, framework conventions) without checking the worker's skill first.
payload must be a JSON object, never a stringified JSON.
Fetch the contract via engine::functions::info before any call; a list description is a hint, not the contract.
Read errors and change something before the next call; never resend an identical failed call; stop looping on repeating infra/timeout errors.

agent-trigger (turn-orchestrator/agent-trigger.ts)

coercePayload parses stringified-JSON payloads.
Decode failures are classified as invalid_arguments (not gate_unavailable), and the target's contract is auto-attached to the error so the next attempt is correctly shaped.
extractFirstJsonObject + extractStructuredHandlerError surface structured {code, fix} handler errors (e.g. S211) instead of burying them in gate_unavailable.
dispatchWithHook(targetCall) sends the target its untouched args so a payload function_id (e.g. for engine::functions::info) is never clobbered by the routing envelope; the approval hook still sees the envelope.

ports (function-execute/ports.ts) — dispatch the target with clean args while the hook sees the routing envelope.
config (config.ts) — default system_default_skills to [].

`feat(iii-directory)` — `skills::index`

Per-worker overview of every installed worker's skills, with a character budget that always lists every worker (truncating only optional descriptions) and re-fetches worker::list per call for live state. The harness points agents here to discover what skills exist before building.

Test plan

Harness unit/regression suite green (1033 tests), incl. new tests: coercePayload, invalid_arguments + auto-attach contract, fetchContract, extractFirstJsonObject, clean-args dispatch (function_id clobber), and the system-prompt rules.
End-to-end validated by driving real agent sessions (console path) on Sonnet 4.5 + Haiku 4.5 building a todo worker, a pubsub/queue/stream pipeline, and a generic KV store — each independently verified live (functions registered, HTTP endpoints serving real data), not via agent self-report.
CI green.

Summary by CodeRabbit

New Features
- Automatic decoding of JSON payloads sent as encoded strings
- Fresh worker discovery so newly registered skills appear immediately
Bug Fixes
- Routing envelope handling fixed so calls keep intended arguments
- Argument-decode failures now classified with clearer invalid-arguments hints
- Orchestrator config default for system_default_skills now empty
Documentation
- Expanded agent-trigger guidance emphasizing contract checks and error handling
Tests
- Broadened test coverage for payload decoding, error extraction, and discovery behaviors

System prompt: load a worker's skill before building (not as a last resort), forbid importing foreign-ecosystem patterns without checking the skill, require payload as a JSON object, fetch the contract before calling, and recover from errors instead of blind identical retries. agent-trigger: coerce stringified payloads; classify decode failures as invalid_arguments and auto-attach the target's contract; surface structured {code,fix} handler errors via a brace-balanced extractor; and send the target its untouched args so a payload function_id is never clobbered by the routing envelope. ports: trigger the target with clean args while the hook still sees the routing envelope. config: default system_default_skills to []. Adds regression tests across agent-trigger, system-prompt, and config.

Render a per-worker overview of every installed worker's skills, with a character budget that always lists every worker and truncates only the optional descriptions. Re-fetch worker::list per call so the index reflects live state. The harness system prompt points agents here to discover what skills exist before building.

vercel · 2026-06-03T20:48:58Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
workers	Ready	Preview, Comment	Jun 3, 2026 8:58pm

coderabbitai · 2026-06-03T20:49:10Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fd25c5e8-a2b6-4c4d-8cca-3ec6092a48ba

📥 Commits

Reviewing files that changed from the base of the PR and between b7656ec and bc59d53.

📒 Files selected for processing (3)

harness/src/turn-orchestrator/agent-trigger.ts
harness/tests/turn-orchestrator/agent-trigger.test.ts
harness/tests/turn-orchestrator/system-prompt.test.ts

🚧 Files skipped from review as they are similar to previous changes (2)

harness/tests/turn-orchestrator/agent-trigger.test.ts
harness/tests/turn-orchestrator/system-prompt.test.ts

📝 Walkthrough

Walkthrough

Coerce JSON-encoded object/array payloads, extract embedded structured handler errors, classify argument-decode failures as invalid_arguments with optional contract enrichment, separate routing envelope from the real target call, expand system prompt guidance, change orchestrator default skills to empty, and add fresh resolution plus per-worker-budget rendering for skills index.

Changes

Agent Trigger & Skills Discovery

Layer / File(s)	Summary
Payload Coercion & Unwrap `harness/src/turn-orchestrator/agent-trigger.ts`, `harness/tests/turn-orchestrator/agent-trigger.test.ts`	`coercePayload` decodes JSON-encoded string payloads (objects/arrays) into native objects/arrays; `unwrapAgentTrigger` applies coercion to `arguments` and `arguments.payload`. Tool description updated and tests added for double-encoded inputs, whitespace, identity, and malformed inputs.
Structured Error Extraction & Helpers `harness/src/turn-orchestrator/agent-trigger.ts`, `harness/tests/turn-orchestrator/agent-trigger.test.ts`	`extractFirstJsonObject` finds the first brace-balanced JSON object (string/escape-aware); `extractStructuredHandlerError` tries direct parse then embedded-object fallback to recover `{code,message}` envelopes. Tests cover nested objects, escaped quotes, trailing junk, and null cases.
Contract Fetching (engine::functions::info) `harness/src/turn-orchestrator/agent-trigger.ts`, `harness/tests/turn-orchestrator/agent-trigger.test.ts`	`fetchContract` best-effort-queries the engine for a function contract, returning `details` or raw value, or `null` on empty id or error. Tests exercise structured/raw responses and loop/self-introspection avoidance.
Argument Decode Classification & Enrichment `harness/src/turn-orchestrator/agent-trigger.ts`, `harness/tests/turn-orchestrator/agent-trigger.test.ts`	`triggerFunctionCall` treats argument-deserialization failures as `invalid_arguments` (not `gate_unavailable`), optionally attaches fetched contract details, and returns enriched messages; tests validate contract attachment and fallback behavior.
Routing Envelope vs Target Call `harness/src/turn-orchestrator/agent-trigger.ts`, `harness/src/turn-orchestrator/function-execute/ports.ts`, `harness/tests/turn-orchestrator/agent-trigger.test.ts`	`dispatchWithHook` accepts a separate `targetCall` so hooks/policies inspect a routing envelope while the actual invocation uses the original target call; ports pass both wrapped and original calls. Regression test ensures target args are not overwritten by routing metadata.
Orchestrator Config Default `harness/src/turn-orchestrator/config.ts`, `harness/tests/turn-orchestrator/config.test.ts`	`system_default_skills` default changed from `['iii://iii-directory/index']` to `[]`; test updated to expect an empty fallback.
System Prompt Expansion `harness/src/turn-orchestrator/system-prompt.ts`, `harness/tests/turn-orchestrator/system-prompt.test.ts`	`IDENTITY_PREAMBLE` rewritten to mandate pre-call contract discovery via `engine::functions::info`, require JSON object payloads (not JSON-encoded strings), describe discovery hierarchy and retry/error rules, and add extensive test coverage asserting these constraints.
Skills Cache Fresh Resolution `iii-directory/src/functions/skills.rs`	`RegisteredWorkersCache::get_fresh` added to bypass TTL; `resolve_visible_skills` gains `fresh` flag. `list`/`get` use cached path, `index` uses fresh path to surface newly registered workers immediately.
Skills Index Rendering & Budget `iii-directory/src/functions/skills.rs`, `iii-directory/src/functions/skills.rs` tests	`render_index_markdown` always includes all worker headings and `get` pointers; per-worker descriptions are budget-capped and omitted descriptions are noted with pointer to `directory::skills::get`. Tests updated to reflect new behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

sergiofilhowz

"A rabbit's note on payloads light and airy,
We parse the strings that once felt wary.
When errors hide in human prose, we find the key,
Fresh skills appear — index full and free.
Hops and tests, kept tidy and merry."

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main objective of hardening the agent loop for error-free worker sessions, which aligns with the PR's primary purpose.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/harness-agent-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-03T20:49:18Z

skill-check — worker

0 verified, 14 skipped (no docs/).

Layer	Result
structure	✓
vale	✓
ai	✓
render	✓

Four for four. Nicely done.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@iii-directory/src/functions/skills.rs`:
- Around line 1196-1203: The code is wrongly comparing out.len() (which includes
headings and “Full reference” pointers) against INDEX_CHAR_BUDGET, so
headings/pointers eat into the per-worker description budget; change the logic
to track only description characters when enforcing INDEX_CHAR_BUDGET. Introduce
a counter (e.g., descriptions_len or desc_chars_used) that accumulates only
description text lengths (use the same desc_cost calculation) and replace the
check if out.len() + desc_cost <= INDEX_CHAR_BUDGET with if descriptions_len +
desc_cost <= INDEX_CHAR_BUDGET, increment descriptions_len when you push a
description, and leave omitted_descriptions behavior unchanged; reference
worker.description, INDEX_CHAR_BUDGET, out and omitted_descriptions to locate
the code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9449cab5-f411-4c5e-b8f1-5c91d77ab89f

📥 Commits

Reviewing files that changed from the base of the PR and between 3440a04 and b7656ec.

📒 Files selected for processing (8)

harness/src/turn-orchestrator/agent-trigger.ts
harness/src/turn-orchestrator/config.ts
harness/src/turn-orchestrator/function-execute/ports.ts
harness/src/turn-orchestrator/system-prompt.ts
harness/tests/turn-orchestrator/agent-trigger.test.ts
harness/tests/turn-orchestrator/config.test.ts
harness/tests/turn-orchestrator/system-prompt.test.ts
iii-directory/src/functions/skills.rs

coderabbitai · 2026-06-03T20:56:59Z

        if !worker.description.is_empty() {
-            block.push('\n');
-            block.push_str(&format!("{}\n", worker.description));
+            let desc_cost = "\n".len() + worker.description.len() + "\n".len();
+            if out.len() + desc_cost <= INDEX_CHAR_BUDGET {
+                out.push('\n');
+                out.push_str(&format!("{}\n", worker.description));
+            } else {
+                omitted_descriptions = true;
+            }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't charge headings and get pointers against the description budget.

INDEX_CHAR_BUDGET is described as the cap for per-worker descriptions, but this branch checks out.len(), so previously emitted headings and Full reference lines consume that budget too. With many workers and short descriptions, later descriptions get dropped much earlier than intended.

Proposed fix

- let mut omitted_descriptions = false; + let mut omitted_descriptions = false; + let mut description_chars_used = 0; @@ if !worker.description.is_empty() { let desc_cost = "\n".len() + worker.description.len() + "\n".len(); - if out.len() + desc_cost <= INDEX_CHAR_BUDGET { + if description_chars_used + desc_cost <= INDEX_CHAR_BUDGET { out.push('\n'); out.push_str(&format!("{}\n", worker.description)); + description_chars_used += desc_cost; } else { omitted_descriptions = true; } }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@iii-directory/src/functions/skills.rs` around lines 1196 - 1203, The code is wrongly comparing out.len() (which includes headings and “Full reference” pointers) against INDEX_CHAR_BUDGET, so headings/pointers eat into the per-worker description budget; change the logic to track only description characters when enforcing INDEX_CHAR_BUDGET. Introduce a counter (e.g., descriptions_len or desc_chars_used) that accumulates only description text lengths (use the same desc_cost calculation) and replace the check if out.len() + desc_cost <= INDEX_CHAR_BUDGET with if descriptions_len + desc_cost <= INDEX_CHAR_BUDGET, increment descriptions_len when you push a description, and leave omitted_descriptions behavior unchanged; reference worker.description, INDEX_CHAR_BUDGET, out and omitted_descriptions to locate the code.

The CI `biome ci` step (Biome 2.4.10) flagged format diffs on the changed files; the local Biome was 1.9.4 so they weren't caught pre-push. Format-only, no logic change. `biome ci harness` now passes; tests green.

andersonleal added 2 commits June 3, 2026 17:47

vercel Bot deployed to Preview June 3, 2026 20:49 View deployment

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

style(harness): apply Biome 2.4.10 formatting to satisfy CI

bc59d53

The CI `biome ci` step (Biome 2.4.10) flagged format diffs on the changed files; the local Biome was 1.9.4 so they weren't caught pre-push. Format-only, no logic change. `biome ci harness` now passes; tests green.

vercel Bot deployed to Preview June 3, 2026 20:58 View deployment

sergiofilhowz approved these changes Jun 3, 2026

View reviewed changes

andersonleal merged commit 84871c4 into main Jun 4, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(harness): harden agent loop for error-free worker sessions#222

feat(harness): harden agent loop for error-free worker sessions#222
andersonleal merged 3 commits into
mainfrom
feat/harness-agent-hardening

andersonleal commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

github-actions Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andersonleal commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

feat(harness) — agent loop hardening

feat(iii-directory) — skills::index

Test plan

Summary by CodeRabbit

Uh oh!

vercel Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

skill-check — worker

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andersonleal commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

`feat(harness)` — agent loop hardening

`feat(iii-directory)` — `skills::index`

vercel Bot commented Jun 3, 2026 •

edited

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

github-actions Bot commented Jun 3, 2026 •

edited

Loading