Skip to content

P2 refinements: advisor decision, tool-surface visibility, compaction/effort findings, Haiku tiering#23

Merged
stxkxs merged 1 commit into
mainfrom
feat/p2-refinements
Jun 4, 2026
Merged

P2 refinements: advisor decision, tool-surface visibility, compaction/effort findings, Haiku tiering#23
stxkxs merged 1 commit into
mainfrom
feat/p2-refinements

Conversation

@stxkxs

@stxkxs stxkxs commented Jun 3, 2026

Copy link
Copy Markdown
Member

The four P2 evaluations from the mid-2026 capabilities plan, each researched against primary docs + the installed SDK types and adversarially verified before deciding. Independent of #21/#22 (branches off main). One commit per item.

The recurring finding: fab's default (managed-agents) transport — and its regulated (Bedrock) path — lag several newer Messages-API features. Three of the four "adopt native feature X" ideas therefore resolve to "can't on our transport / already handled / keep custom." Getting these right means not shipping code that wouldn't run or would silently regress behavior.

P2.b — native advisor tool → keep consult_advisor (33fa0eb)

Native advisor tool (beta advisor-tool-2026-03-01) can't preserve fab's invariants: only a per-request max_uses (no per-session budget; per-conversation capping means hand-stripping history), and it's beta on Claude API / CPA only — not Bedrock, not the Managed Agents toolset. Recorded the rationale in src/advisor.ts so it isn't silently swapped.

P2.c — Tool Search / defer_loading → can't adopt yet; made the cost visible (a42d9e0)

Corrected a wrong plan assumption: 23/83 roles wire ≥4 MCP servers (github alone ~50 tools), so a role can load 100+ tool defs eagerly. But defer_loading/Tool Search is Messages-API-only — not exposed by the Managed Agents API. Added summarizeToolSurface + a single deploy-time line surfacing the latent context cost (no per-role noise), documented the constraint + revisit trigger. Tested, incl. against the live roster.

P2.d — compaction → already automatic; nothing to wire (6e94139)

The installed Agent SDK 0.3.x types show compaction is automatic in the Claude Code loop (SDKStatus: 'compacting', Pre/PostCompact hooks); it's not a query() option and the SDK betas union only accepts context-1m-2025-08-07 — so the plan's compact-2026-01-12 sketch wasn't even expressible. managed-agents handles context via its session log. Recorded at the call site.

P2.e — Haiku/effort tiering → pilot methodology, no blind flips (716abf5)

Documented the Haiku cost opportunity (0 roles on Haiku today), concrete candidate roles, and a data-driven pilot path via the existing fab model set override (firm roles skip the gate, so flipping defaults blind is unsafe). effort deferred (shape change + not on the managed-agents surface).

Verification

npm run lint / build / format:check clean · npm test 270/270 (+3 new).

@stxkxs stxkxs force-pushed the feat/p2-refinements branch from 716abf5 to a3abb11 Compare June 4, 2026 00:24
…mpaction, tiering

The four P2 evaluations from the mid-2026 Claude/Anthropic capabilities review,
each researched against primary docs + the installed SDK types and adversarially
verified before deciding. Recurring finding: fab's default (managed-agents)
transport and Bedrock lag several newer Messages-API features, so three of the
four "adopt native feature X" ideas correctly resolve to keep / can't-yet /
already-handled — recorded in-code so the decisions aren't silently reversed.

─── P2.b · advisor tool → keep custom consult_advisor ───

Anthropic's native advisor tool (beta advisor-tool-2026-03-01) can't preserve
fab's invariants: only a per-REQUEST max_uses (no per-session budget), and it's
beta on the Claude API + Claude Platform on AWS only — not Bedrock, not the
Managed Agents toolset. Rationale recorded as a header comment in src/advisor.ts.

─── P2.c · Tool Search / defer_loading → can't adopt yet; cost made visible ───

Corrected a wrong assumption: 23 of 83 roles wire ≥4 MCP servers (github alone
exposes ~50 tools), so a role can load 100+ tool definitions eagerly. But
defer_loading / Tool Search is Messages-API-only — not exposed by the Managed
Agents API. Added summarizeToolSurface() + a single deploy-time line surfacing
the latent context cost (no per-role noise), documented the constraint + revisit
trigger in src/mcp.ts. Tested, incl. against the live roster.

─── P2.d · compaction → already automatic; nothing to wire ───

The installed Agent SDK 0.3.x types show the Claude Code loop auto-compacts
(SDKStatus 'compacting' + Pre/PostCompact hooks); it is not a query() option and
the SDK betas union only accepts context-1m-2025-08-07. managed-agents handles
long context via its durable session log. Recorded at the sdk.ts query() site.

─── P2.e · Haiku / effort tiering → pilot methodology, no blind flips ───

fab runs 82 roles on Sonnet, 0 on Haiku (a real $1/$5 vs $3/$15 lever), but
flipping defaults without eval data silently risks quality and firm roles skip
the merge gate. docs/roster.md gains a Model tiering section: candidate roles,
the data-driven pilot path via `fab model set`, and the effort deferral.

No behavior change beyond the deploy-time tool-surface line. Verified: npm run
lint / build / format:check clean; npm test passes.

Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
@stxkxs stxkxs force-pushed the feat/p2-refinements branch from a3abb11 to 7f660df Compare June 4, 2026 02:10
@stxkxs stxkxs marked this pull request as ready for review June 4, 2026 02:10
@stxkxs stxkxs merged commit 9cfb5c1 into main Jun 4, 2026
@stxkxs stxkxs deleted the feat/p2-refinements branch June 4, 2026 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant