P2 refinements: advisor decision, tool-surface visibility, compaction/effort findings, Haiku tiering by stxkxs · Pull Request #23 · nanohype/fab

stxkxs · 2026-06-03T22:09:33Z

The four P2 evaluations from the mid-2026 capabilities plan, each researched against primary docs + the installed SDK types and adversarially verified before deciding. Independent of #21/#22 (branches off main). One commit per item.

The recurring finding: fab's default (managed-agents) transport — and its regulated (Bedrock) path — lag several newer Messages-API features. Three of the four "adopt native feature X" ideas therefore resolve to "can't on our transport / already handled / keep custom." Getting these right means not shipping code that wouldn't run or would silently regress behavior.

P2.b — native advisor tool → keep `consult_advisor` (`33fa0eb`)

Native advisor tool (beta advisor-tool-2026-03-01) can't preserve fab's invariants: only a per-request max_uses (no per-session budget; per-conversation capping means hand-stripping history), and it's beta on Claude API / CPA only — not Bedrock, not the Managed Agents toolset. Recorded the rationale in src/advisor.ts so it isn't silently swapped.

P2.c — Tool Search / defer_loading → can't adopt yet; made the cost visible (`a42d9e0`)

Corrected a wrong plan assumption: 23/83 roles wire ≥4 MCP servers (github alone ~50 tools), so a role can load 100+ tool defs eagerly. But defer_loading/Tool Search is Messages-API-only — not exposed by the Managed Agents API. Added summarizeToolSurface + a single deploy-time line surfacing the latent context cost (no per-role noise), documented the constraint + revisit trigger. Tested, incl. against the live roster.

P2.d — compaction → already automatic; nothing to wire (`6e94139`)

The installed Agent SDK 0.3.x types show compaction is automatic in the Claude Code loop (SDKStatus: 'compacting', Pre/PostCompact hooks); it's not a query() option and the SDK betas union only accepts context-1m-2025-08-07 — so the plan's compact-2026-01-12 sketch wasn't even expressible. managed-agents handles context via its session log. Recorded at the call site.

P2.e — Haiku/effort tiering → pilot methodology, no blind flips (`716abf5`)

Documented the Haiku cost opportunity (0 roles on Haiku today), concrete candidate roles, and a data-driven pilot path via the existing fab model set override (firm roles skip the gate, so flipping defaults blind is unsafe). effort deferred (shape change + not on the managed-agents surface).

Verification

npm run lint / build / format:check clean · npm test 270/270 (+3 new).

…mpaction, tiering The four P2 evaluations from the mid-2026 Claude/Anthropic capabilities review, each researched against primary docs + the installed SDK types and adversarially verified before deciding. Recurring finding: fab's default (managed-agents) transport and Bedrock lag several newer Messages-API features, so three of the four "adopt native feature X" ideas correctly resolve to keep / can't-yet / already-handled — recorded in-code so the decisions aren't silently reversed. ─── P2.b · advisor tool → keep custom consult_advisor ─── Anthropic's native advisor tool (beta advisor-tool-2026-03-01) can't preserve fab's invariants: only a per-REQUEST max_uses (no per-session budget), and it's beta on the Claude API + Claude Platform on AWS only — not Bedrock, not the Managed Agents toolset. Rationale recorded as a header comment in src/advisor.ts. ─── P2.c · Tool Search / defer_loading → can't adopt yet; cost made visible ─── Corrected a wrong assumption: 23 of 83 roles wire ≥4 MCP servers (github alone exposes ~50 tools), so a role can load 100+ tool definitions eagerly. But defer_loading / Tool Search is Messages-API-only — not exposed by the Managed Agents API. Added summarizeToolSurface() + a single deploy-time line surfacing the latent context cost (no per-role noise), documented the constraint + revisit trigger in src/mcp.ts. Tested, incl. against the live roster. ─── P2.d · compaction → already automatic; nothing to wire ─── The installed Agent SDK 0.3.x types show the Claude Code loop auto-compacts (SDKStatus 'compacting' + Pre/PostCompact hooks); it is not a query() option and the SDK betas union only accepts context-1m-2025-08-07. managed-agents handles long context via its durable session log. Recorded at the sdk.ts query() site. ─── P2.e · Haiku / effort tiering → pilot methodology, no blind flips ─── fab runs 82 roles on Sonnet, 0 on Haiku (a real $1/$5 vs $3/$15 lever), but flipping defaults without eval data silently risks quality and firm roles skip the merge gate. docs/roster.md gains a Model tiering section: candidate roles, the data-driven pilot path via `fab model set`, and the effort deferral. No behavior change beyond the deploy-time tool-surface line. Verified: npm run lint / build / format:check clean; npm test passes. Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>

stxkxs force-pushed the feat/p2-refinements branch from 716abf5 to a3abb11 Compare June 4, 2026 00:24

stxkxs force-pushed the feat/p2-refinements branch from a3abb11 to 7f660df Compare June 4, 2026 02:10

stxkxs marked this pull request as ready for review June 4, 2026 02:10

stxkxs merged commit 9cfb5c1 into main Jun 4, 2026

stxkxs deleted the feat/p2-refinements branch June 4, 2026 02:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P2 refinements: advisor decision, tool-surface visibility, compaction/effort findings, Haiku tiering#23

P2 refinements: advisor decision, tool-surface visibility, compaction/effort findings, Haiku tiering#23
stxkxs merged 1 commit into
mainfrom
feat/p2-refinements

stxkxs commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stxkxs commented Jun 3, 2026

P2.b — native advisor tool → keep consult_advisor (33fa0eb)

P2.c — Tool Search / defer_loading → can't adopt yet; made the cost visible (a42d9e0)

P2.d — compaction → already automatic; nothing to wire (6e94139)

P2.e — Haiku/effort tiering → pilot methodology, no blind flips (716abf5)

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

P2.b — native advisor tool → keep `consult_advisor` (`33fa0eb`)

P2.c — Tool Search / defer_loading → can't adopt yet; made the cost visible (`a42d9e0`)

P2.d — compaction → already automatic; nothing to wire (`6e94139`)

P2.e — Haiku/effort tiering → pilot methodology, no blind flips (`716abf5`)