Skip to content

Define natural agent delegation policy #212

@cbusillo

Description

@cbusillo

Idea

This is not a ready-to-build todo. It is a prompt/product design idea to examine while we work on agent calling.

During the release-notes workflow discussion, the first agent batch used GPT/Claude reviewers and produced useful consensus. Adding Antigravity afterward provided a different framing: the changelog and GitHub Release page have different audiences, so the prompt should optimize for impact/readability rather than mechanically mirroring changelog bullets. That dissenting/orthogonal view improved the final prompt.

Questions To Explore

  • What task classes should default to three opinions vs. one or two agents? Current leaning: three for explicit ask agents, planning/design/review/ambiguous work; fewer for narrow mechanical tasks.
  • How should token burn rate and provider rate-limit pressure be represented in the selection algorithm? Example policy input: if a provider is consuming budget faster than remaining context/task progress justifies, temporarily cool it off or swap to local/cheaper lanes.
  • Which alias surface is safest for Google-family requests: expose gemini as an alias for antigravity, or keep antigravity visible while teaching the harness that Gemini/Google intent maps there?
  • How should local agents report uncertainty after a first-pass scan so premium agents can jump directly to the right files/claims?
  • Which synthesis format best preserves dissent without making every agent run verbose?

Useful Evidence

This would be a good place to examine real-world agent usage through rollout file scanning. Look for patterns such as:

  • which model sets are commonly selected together
  • whether agents are used mostly for consensus, implementation, review, or dissent
  • whether Antigravity or other cross-family agents are underused in subjective/product/prompt tasks
  • whether unique minority recommendations correlate with later user corrections or better outcomes
  • how often agent result summaries surface disagreement versus flattening it into consensus

Why This Might Matter

Agent diversity helped catch a subtle quality issue: a hard bullet-count prompt was enforcing shape instead of asking the model to use judgment. The resulting release-note prompt became less brittle while the release validator stayed structural.

The goal is not to automatically add more agents everywhere. The goal is to understand when a small amount of intentional diversity or dissent improves decisions enough to be worth the latency and cost.

Current Status

State: Active after PR #265 merged on 2026-05-31.

Done this session:

  • gemini/google intent aliases now resolve to canonical antigravity for the Google/Gemini-family lane.
  • Model-visible guidance now nudges explicit multi-agent/dissent requests toward diverse GPT + Claude + Antigravity batches when useful and budget allows.
  • Exec-path coverage preserves the manual Gemini CLI escape hatch: custom command = "gemini" configs do not inherit Antigravity-only argv.

Next action: Define and implement the broader natural agent delegation policy for the primary Every Code agent, separate from Auto Drive routing.

Remaining policy work:

  • Decide default fanout for explicit "ask agents" / dissent requests, including when two opinions are enough.
  • Add token/rate-limit burn-rate awareness so expensive providers cool off when budget is under pressure.
  • Define local/private provider routing for huge-file first passes, secret-sensitive scans, log summarization, and context narrowing.
  • Preserve dissent/minority arguments in synthesis instead of flattening immediately to consensus.

Acceptance Criteria

  • Define natural agent selection heuristics for the primary Every Code agent, independent of Auto Drive.
  • Explicit ask agents requests normally produce three opinions when useful and budget allows; fewer agents are acceptable for narrow/mechanical work.
  • Diverse batches prefer GPT + Claude + Google/Antigravity when available, especially for planning, design, prompt, review, and ambiguous problem-solving tasks.
  • Token/rate-limit budget is part of selection: cool off providers or reduce fanout when burn rate is too high relative to remaining context/budget.
  • Local/private providers are preferred for expensive first-pass scans, huge files, private/secret-sensitive inspection, log summarization, and context narrowing before premium models are asked to reason.
  • The policy distinguishes canonical selector names from intent aliases: antigravity is canonical; gemini/google can resolve to it as aliases with a clear caveat that AGY uses its configured model.
  • Agent result synthesis surfaces dissent and unique minority arguments before flattening consensus.
  • If an obvious family/provider is skipped, the main agent briefly records why.

Metadata

Metadata

Assignees

No one assigned

    Labels

    planDurable planning issueplan:activeCurrent active plan

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions