Define natural agent delegation policy

## Idea

This is not a ready-to-build todo. It is a prompt/product design idea to examine while we work on agent calling.

During the release-notes workflow discussion, the first agent batch used GPT/Claude reviewers and produced useful consensus. Adding Antigravity afterward provided a different framing: the changelog and GitHub Release page have different audiences, so the prompt should optimize for impact/readability rather than mechanically mirroring changelog bullets. That dissenting/orthogonal view improved the final prompt.

## Questions To Explore

- What task classes should default to three opinions vs. one or two agents? Current leaning: three for explicit `ask agents`, planning/design/review/ambiguous work; fewer for narrow mechanical tasks.
- How should token burn rate and provider rate-limit pressure be represented in the selection algorithm? Example policy input: if a provider is consuming budget faster than remaining context/task progress justifies, temporarily cool it off or swap to local/cheaper lanes.
- Which alias surface is safest for Google-family requests: expose `gemini` as an alias for `antigravity`, or keep `antigravity` visible while teaching the harness that Gemini/Google intent maps there?
- How should local agents report uncertainty after a first-pass scan so premium agents can jump directly to the right files/claims?
- Which synthesis format best preserves dissent without making every agent run verbose?

## Useful Evidence

This would be a good place to examine real-world agent usage through rollout file scanning. Look for patterns such as:

- which model sets are commonly selected together
- whether agents are used mostly for consensus, implementation, review, or dissent
- whether Antigravity or other cross-family agents are underused in subjective/product/prompt tasks
- whether unique minority recommendations correlate with later user corrections or better outcomes
- how often agent result summaries surface disagreement versus flattening it into consensus

## Why This Might Matter

Agent diversity helped catch a subtle quality issue: a hard bullet-count prompt was enforcing shape instead of asking the model to use judgment. The resulting release-note prompt became less brittle while the release validator stayed structural.

The goal is not to automatically add more agents everywhere. The goal is to understand when a small amount of intentional diversity or dissent improves decisions enough to be worth the latency and cost.

## Current Status

State: Active after PR #265 merged on 2026-05-31.

Done this session:
- `gemini`/`google` intent aliases now resolve to canonical `antigravity` for the Google/Gemini-family lane.
- Model-visible guidance now nudges explicit multi-agent/dissent requests toward diverse GPT + Claude + Antigravity batches when useful and budget allows.
- Exec-path coverage preserves the manual Gemini CLI escape hatch: custom `command = "gemini"` configs do not inherit Antigravity-only argv.

Next action: Define and implement the broader natural agent delegation policy for the primary Every Code agent, separate from Auto Drive routing.

Remaining policy work:
- Decide default fanout for explicit "ask agents" / dissent requests, including when two opinions are enough.
- Add token/rate-limit burn-rate awareness so expensive providers cool off when budget is under pressure.
- Define local/private provider routing for huge-file first passes, secret-sensitive scans, log summarization, and context narrowing.
- Preserve dissent/minority arguments in synthesis instead of flattening immediately to consensus.

## Acceptance Criteria

- [ ] Define natural agent selection heuristics for the primary Every Code agent, independent of Auto Drive.
- [ ] Explicit `ask agents` requests normally produce three opinions when useful and budget allows; fewer agents are acceptable for narrow/mechanical work.
- [ ] Diverse batches prefer GPT + Claude + Google/Antigravity when available, especially for planning, design, prompt, review, and ambiguous problem-solving tasks.
- [ ] Token/rate-limit budget is part of selection: cool off providers or reduce fanout when burn rate is too high relative to remaining context/budget.
- [ ] Local/private providers are preferred for expensive first-pass scans, huge files, private/secret-sensitive inspection, log summarization, and context narrowing before premium models are asked to reason.
- [ ] The policy distinguishes canonical selector names from intent aliases: `antigravity` is canonical; `gemini`/`google` can resolve to it as aliases with a clear caveat that AGY uses its configured model.
- [ ] Agent result synthesis surfaces dissent and unique minority arguments before flattening consensus.
- [ ] If an obvious family/provider is skipped, the main agent briefly records why.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define natural agent delegation policy #212

Idea

Questions To Explore

Useful Evidence

Why This Might Matter

Current Status

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Define natural agent delegation policy #212

Description

Idea

Questions To Explore

Useful Evidence

Why This Might Matter

Current Status

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions