feat(harness): harden agent loop for error-free worker sessions#222
Conversation
System prompt: load a worker's skill before building (not as a last
resort), forbid importing foreign-ecosystem patterns without checking the
skill, require payload as a JSON object, fetch the contract before
calling, and recover from errors instead of blind identical retries.
agent-trigger: coerce stringified payloads; classify decode failures as
invalid_arguments and auto-attach the target's contract; surface
structured {code,fix} handler errors via a brace-balanced extractor; and
send the target its untouched args so a payload function_id is never
clobbered by the routing envelope.
ports: trigger the target with clean args while the hook still sees the
routing envelope. config: default system_default_skills to [].
Adds regression tests across agent-trigger, system-prompt, and config.
Render a per-worker overview of every installed worker's skills, with a character budget that always lists every worker and truncates only the optional descriptions. Re-fetch worker::list per call so the index reflects live state. The harness system prompt points agents here to discover what skills exist before building.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughCoerce JSON-encoded object/array payloads, extract embedded structured handler errors, classify argument-decode failures as invalid_arguments with optional contract enrichment, separate routing envelope from the real target call, expand system prompt guidance, change orchestrator default skills to empty, and add fresh resolution plus per-worker-budget rendering for skills index. ChangesAgent Trigger & Skills Discovery
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
skill-check — worker0 verified, 14 skipped (no docs/).
Four for four. Nicely done. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@iii-directory/src/functions/skills.rs`:
- Around line 1196-1203: The code is wrongly comparing out.len() (which includes
headings and “Full reference” pointers) against INDEX_CHAR_BUDGET, so
headings/pointers eat into the per-worker description budget; change the logic
to track only description characters when enforcing INDEX_CHAR_BUDGET. Introduce
a counter (e.g., descriptions_len or desc_chars_used) that accumulates only
description text lengths (use the same desc_cost calculation) and replace the
check if out.len() + desc_cost <= INDEX_CHAR_BUDGET with if descriptions_len +
desc_cost <= INDEX_CHAR_BUDGET, increment descriptions_len when you push a
description, and leave omitted_descriptions behavior unchanged; reference
worker.description, INDEX_CHAR_BUDGET, out and omitted_descriptions to locate
the code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9449cab5-f411-4c5e-b8f1-5c91d77ab89f
📒 Files selected for processing (8)
harness/src/turn-orchestrator/agent-trigger.tsharness/src/turn-orchestrator/config.tsharness/src/turn-orchestrator/function-execute/ports.tsharness/src/turn-orchestrator/system-prompt.tsharness/tests/turn-orchestrator/agent-trigger.test.tsharness/tests/turn-orchestrator/config.test.tsharness/tests/turn-orchestrator/system-prompt.test.tsiii-directory/src/functions/skills.rs
| if !worker.description.is_empty() { | ||
| block.push('\n'); | ||
| block.push_str(&format!("{}\n", worker.description)); | ||
| let desc_cost = "\n".len() + worker.description.len() + "\n".len(); | ||
| if out.len() + desc_cost <= INDEX_CHAR_BUDGET { | ||
| out.push('\n'); | ||
| out.push_str(&format!("{}\n", worker.description)); | ||
| } else { | ||
| omitted_descriptions = true; | ||
| } |
There was a problem hiding this comment.
Don't charge headings and get pointers against the description budget.
INDEX_CHAR_BUDGET is described as the cap for per-worker descriptions, but this branch checks out.len(), so previously emitted headings and Full reference lines consume that budget too. With many workers and short descriptions, later descriptions get dropped much earlier than intended.
Proposed fix
- let mut omitted_descriptions = false;
+ let mut omitted_descriptions = false;
+ let mut description_chars_used = 0;
@@
if !worker.description.is_empty() {
let desc_cost = "\n".len() + worker.description.len() + "\n".len();
- if out.len() + desc_cost <= INDEX_CHAR_BUDGET {
+ if description_chars_used + desc_cost <= INDEX_CHAR_BUDGET {
out.push('\n');
out.push_str(&format!("{}\n", worker.description));
+ description_chars_used += desc_cost;
} else {
omitted_descriptions = true;
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@iii-directory/src/functions/skills.rs` around lines 1196 - 1203, The code is
wrongly comparing out.len() (which includes headings and “Full reference”
pointers) against INDEX_CHAR_BUDGET, so headings/pointers eat into the
per-worker description budget; change the logic to track only description
characters when enforcing INDEX_CHAR_BUDGET. Introduce a counter (e.g.,
descriptions_len or desc_chars_used) that accumulates only description text
lengths (use the same desc_cost calculation) and replace the check if out.len()
+ desc_cost <= INDEX_CHAR_BUDGET with if descriptions_len + desc_cost <=
INDEX_CHAR_BUDGET, increment descriptions_len when you push a description, and
leave omitted_descriptions behavior unchanged; reference worker.description,
INDEX_CHAR_BUDGET, out and omitted_descriptions to locate the code.
The CI `biome ci` step (Biome 2.4.10) flagged format diffs on the changed files; the local Biome was 1.9.4 so they weren't caught pre-push. Format-only, no logic change. `biome ci harness` now passes; tests green.
Summary
Hardens the iii agent harness so agent-built worker sessions run error-free, plus the
iii-directoryskills index the harness points agents at. Driven by analyzing real failing agent sessions and validated end-to-end by re-running build tasks (todo worker, pubsub/queue/stream pipeline, generic KV store) on Claude Sonnet 4.5 and Haiku 4.5.feat(harness)— agent loop hardeningSystem prompt (
turn-orchestrator/system-prompt.ts)payloadmust be a JSON object, never a stringified JSON.engine::functions::infobefore any call; alistdescription is a hint, not the contract.agent-trigger (
turn-orchestrator/agent-trigger.ts)coercePayloadparses stringified-JSON payloads.invalid_arguments(notgate_unavailable), and the target's contract is auto-attached to the error so the next attempt is correctly shaped.extractFirstJsonObject+extractStructuredHandlerErrorsurface structured{code, fix}handler errors (e.g. S211) instead of burying them ingate_unavailable.dispatchWithHook(targetCall)sends the target its untouched args so a payloadfunction_id(e.g. forengine::functions::info) is never clobbered by the routing envelope; the approval hook still sees the envelope.ports (
function-execute/ports.ts) — dispatch the target with clean args while the hook sees the routing envelope.config (
config.ts) — defaultsystem_default_skillsto[].feat(iii-directory)—skills::indexPer-worker overview of every installed worker's skills, with a character budget that always lists every worker (truncating only optional descriptions) and re-fetches
worker::listper call for live state. The harness points agents here to discover what skills exist before building.Test plan
coercePayload,invalid_arguments+ auto-attach contract,fetchContract,extractFirstJsonObject, clean-args dispatch (function_id clobber), and the system-prompt rules.Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests