P2 refinements: advisor decision, tool-surface visibility, compaction/effort findings, Haiku tiering#23
Merged
Merged
Conversation
716abf5 to
a3abb11
Compare
…mpaction, tiering The four P2 evaluations from the mid-2026 Claude/Anthropic capabilities review, each researched against primary docs + the installed SDK types and adversarially verified before deciding. Recurring finding: fab's default (managed-agents) transport and Bedrock lag several newer Messages-API features, so three of the four "adopt native feature X" ideas correctly resolve to keep / can't-yet / already-handled — recorded in-code so the decisions aren't silently reversed. ─── P2.b · advisor tool → keep custom consult_advisor ─── Anthropic's native advisor tool (beta advisor-tool-2026-03-01) can't preserve fab's invariants: only a per-REQUEST max_uses (no per-session budget), and it's beta on the Claude API + Claude Platform on AWS only — not Bedrock, not the Managed Agents toolset. Rationale recorded as a header comment in src/advisor.ts. ─── P2.c · Tool Search / defer_loading → can't adopt yet; cost made visible ─── Corrected a wrong assumption: 23 of 83 roles wire ≥4 MCP servers (github alone exposes ~50 tools), so a role can load 100+ tool definitions eagerly. But defer_loading / Tool Search is Messages-API-only — not exposed by the Managed Agents API. Added summarizeToolSurface() + a single deploy-time line surfacing the latent context cost (no per-role noise), documented the constraint + revisit trigger in src/mcp.ts. Tested, incl. against the live roster. ─── P2.d · compaction → already automatic; nothing to wire ─── The installed Agent SDK 0.3.x types show the Claude Code loop auto-compacts (SDKStatus 'compacting' + Pre/PostCompact hooks); it is not a query() option and the SDK betas union only accepts context-1m-2025-08-07. managed-agents handles long context via its durable session log. Recorded at the sdk.ts query() site. ─── P2.e · Haiku / effort tiering → pilot methodology, no blind flips ─── fab runs 82 roles on Sonnet, 0 on Haiku (a real $1/$5 vs $3/$15 lever), but flipping defaults without eval data silently risks quality and firm roles skip the merge gate. docs/roster.md gains a Model tiering section: candidate roles, the data-driven pilot path via `fab model set`, and the effort deferral. No behavior change beyond the deploy-time tool-surface line. Verified: npm run lint / build / format:check clean; npm test passes. Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
a3abb11 to
7f660df
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The four P2 evaluations from the mid-2026 capabilities plan, each researched against primary docs + the installed SDK types and adversarially verified before deciding. Independent of #21/#22 (branches off
main). One commit per item.The recurring finding: fab's default (managed-agents) transport — and its regulated (Bedrock) path — lag several newer Messages-API features. Three of the four "adopt native feature X" ideas therefore resolve to "can't on our transport / already handled / keep custom." Getting these right means not shipping code that wouldn't run or would silently regress behavior.
P2.b — native advisor tool → keep
consult_advisor(33fa0eb)Native advisor tool (beta
advisor-tool-2026-03-01) can't preserve fab's invariants: only a per-requestmax_uses(no per-session budget; per-conversation capping means hand-stripping history), and it's beta on Claude API / CPA only — not Bedrock, not the Managed Agents toolset. Recorded the rationale insrc/advisor.tsso it isn't silently swapped.P2.c — Tool Search / defer_loading → can't adopt yet; made the cost visible (
a42d9e0)Corrected a wrong plan assumption: 23/83 roles wire ≥4 MCP servers (github alone ~50 tools), so a role can load 100+ tool defs eagerly. But
defer_loading/Tool Search is Messages-API-only — not exposed by the Managed Agents API. AddedsummarizeToolSurface+ a single deploy-time line surfacing the latent context cost (no per-role noise), documented the constraint + revisit trigger. Tested, incl. against the live roster.P2.d — compaction → already automatic; nothing to wire (
6e94139)The installed Agent SDK 0.3.x types show compaction is automatic in the Claude Code loop (
SDKStatus: 'compacting', Pre/PostCompact hooks); it's not aquery()option and the SDKbetasunion only acceptscontext-1m-2025-08-07— so the plan'scompact-2026-01-12sketch wasn't even expressible. managed-agents handles context via its session log. Recorded at the call site.P2.e — Haiku/effort tiering → pilot methodology, no blind flips (
716abf5)Documented the Haiku cost opportunity (0 roles on Haiku today), concrete candidate roles, and a data-driven pilot path via the existing
fab model setoverride (firm roles skip the gate, so flipping defaults blind is unsafe).effortdeferred (shape change + not on the managed-agents surface).Verification
npm run lint/build/format:checkclean ·npm test270/270 (+3 new).