Intelligent Model-Routing Agent (ADK 2.0)

A four-tier intelligent routing agent built on the Google Antigravity (AGY) SDK / ADK 2.0 for Gemini Enterprise. It classifies user queries in real-time using Gemini 3.1 Flash-Lite (gemini-3.1-flash-lite) and deterministically dispatches them to Claude Haiku (claude-haiku-4-5), Claude Sonnet (claude-sonnet-4-6), Claude Opus (claude-opus-4-8), or Claude Fable (claude-fable-5) via Model Garden on the Agent Platform.

Architecture & Workflow

The routing agent is defined as a sequential Workflow graph containing guards, a classifier, and a deterministic dispatch engine.

graph TD
    START[User Input] --> CG[context_guard_node]
    CG -->|Extract token length| COMP[compliance_guard_node]
    COMP -->|Check ZDR / Fable Refusal| CLAS[classifier_agent]
    CLAS -->|Structured routing decision| DISP[dispatch_node]
    DISP -->|claude_haiku| HAIKU[claude_haiku_specialist]
    DISP -->|claude_sonnet| SONNET[claude_sonnet_specialist]
    DISP -->|claude_opus| OPUS[claude_opus_specialist]
    DISP -->|claude_fable| FABLE[claude_fable_specialist]
    HAIKU --> END[Response]
    SONNET --> END
    OPUS --> END
    FABLE --> END

Routing Stages:

context_guard_node: Estimates the text length. If the query exceeds LONG_CONTEXT_ROUTE_THRESHOLD (200,000 tokens), it sets long_context = True in the session state delta.
compliance_guard_node: Checks the text for sensitive keywords (ZDR markers like phi, gdpr, ssn, pii, etc.) and domains that trigger high false-positive safety refusals in Fable (e.g., cybersecurity, medical genetics, malware). It sets zdr_required and fable_refusal flags.
classifier_agent: A Gemini 3.1 Flash-Lite LlmAgent that analyzes query complexity, latency sensitivity, and token volume to output a structured JSON schema (RouteResultSchema) selecting the target tier.
dispatch_node: A deterministic Python function node (not an LLM). If the classifier makes an invalid selection, it defaults to claude_haiku. It applies deterministic overrides:
- Context Overflow: If long_context = True and the chosen tier was claude_haiku, the route is overridden to claude_sonnet.
- Compliance/ZDR Safeguards: If zdr_required = True or fable_refusal = True, it overrides routes selecting claude_fable to claude_opus.

Model Routing Matrix

Tier Name	Target Model ID	Cost (Input/Output per 1M)	Primary Typology
Classifier	`gemini-3.1-flash-lite`	$0.25 / $1.50	Routing brain & structured classification
`claude_haiku`	`claude-haiku-4-5`	$1.00 / $5.00	Simple facts, greetings, summaries, short conversations
`claude_sonnet`	`claude-sonnet-4-6`	$3.30 / $16.50	Code generation, technical writing, document analysis
`claude_opus`	`claude-opus-4-8`	$5.50 / $27.50	Deep mathematical reasoning, complex logic
`claude_fable`	`claude-fable-5`	$10.00 / $50.00	Frontier tasks, long-horizon multi-step deliverables

Key Design Decisions

Session Path Extraction (FR-18): Gemini Enterprise provides session identifiers as fully qualified resource paths (e.g., projects/.../engines/.../sessions/session_123). Since ADK's built-in session validation regex (^[A-Za-z0-9_-]+$) rejects /, the application extracts the terminal identifier (session_123) at the controller entrypoint boundary before passing it to SessionService. No fragile monkeypatches of internal SDK regexes are used.
Native State Updates (FR-19): Graph state changes are set via state_delta on the native run_async / run runner method calls.
Self-Test on Startup (FR-21 / RK-1): Introspects Runner.run_async at startup to verify state_delta is supported. Fails fast if the ADK runner contract drifts.
Escalation Cascade (FR-13): The harness-level fallback cascade automatically elevates failing queries to the next tier on retryable errors (e.g. 429 rate limits, 5xx, timeouts). Loop safety is guaranteed by bounding the hops to MAX_ESCALATIONS_PER_TURN = 1 (FR-15).

Setup & Installation

Follow these steps to set up the project on your local machine.

1. Initialize Virtual Environment

It is highly recommended to use Python 3.12+ for runtime and dependency alignment.

python3 -m venv .venv
source .venv/bin/activate

2. Install Dependencies

Install the required packages pinned in requirements.txt:

pip install --upgrade pip
pip install -r requirements.txt

3. Troubleshooting OpenTelemetry Dependency Conflicts

google-adk depends on specific versions of OpenTelemetry packages. Mismatches between opentelemetry-api and opentelemetry-sdk will trigger ImportError or installation errors during package resolution.

Verification: Ensure both libraries are pinned to the exact same version.
```
pip show opentelemetry-api opentelemetry-sdk
```
Resolution: If a version conflict occurs, force install version 1.41.1 (or another aligned version) to resolve the mismatch:
```
pip install "opentelemetry-api==1.41.1" "opentelemetry-sdk==1.41.1"
```
Pydantic Warnings: Ensure pydantic==2.13.4 and pydantic-core==2.46.4 are pinned to avoid base-image serialization failures when deploying to Agent Platform (Bug #6041).

4. GCP & Application Default Credentials (ADC) Setup

Ensure you have the Google Cloud SDK installed. Authorize using Application Default Credentials (ADC):

gcloud auth application-default login

Set your target GCP project:

export GOOGLE_CLOUD_PROJECT="YOUR_PROJECT_ID"
export GOOGLE_CLOUD_PROJECT_NUMBER="YOUR_PROJECT_NUMBER"

Environment Configuration

Configure the agent using the following environment variables:

Variable Name	Default Value	Purpose
`ROUTER_CLASSIFIER_MODEL`	`gemini-3.1-flash-lite`	The LLM used to classify incoming queries.
`USE_SONNET_CLASSIFIER`	`0` (False)	Set to `1` to override and use Claude Sonnet as the classifier.
`ENABLE_SEARCH_GROUNDING`	`0` (False)	Set to `1` to enable real-time search grounding (Haiku only).
`ROUTER_SEARCH_MODEL`	`gemini-3.5-flash`	The sub-agent model performing Google searches.
`OPUS_SHARE_ALERT_THRESHOLD`	`0.20`	Ceiling of Opus traffic share before triggering cost alert logs.
`DISABLE_GCP_LOGGING`	Not set	Set to `TRUE` to disable sending logs to Google Cloud Logging.

Deployment

Deploy to Agent Platform Agent Engine: Run the deployment script. It compiles the ADK app and pushes it to Agent Platform.
```
python deploy.py
```
Output: This will log the deployed Reasoning Engine resource name. Copy this identifier: projects/<PROJECT_NUMBER>/locations/us-central1/reasoningEngines/<ENGINE_ID>

Save Resource Name: Export the engine ID as an environment variable for testing and updates:

export REASONING_ENGINE_RESOURCE_NAME="projects/<PROJECT_NUMBER>/locations/us-central1/reasoningEngines/<ENGINE_ID>"

Gemini Enterprise Registration

Register the deployed Reasoning Engine in Gemini Enterprise via the Discovery Engine API:

export APP_ID="your_gemini_enterprise_app_id"
export REASONING_ENGINE="projects/<PROJECT_NUMBER>/locations/us-central1/reasoningEngines/<ENGINE_ID>"
bash register_ge.sh

Verify that the agent appears in your Gemini Enterprise UI dashboard.

Interactive Testing & CLI Client

You can run queries against the deployed Agent Platform engine using query_agent.py.

Single-query mode:

python query_agent.py "Prove that the sum of the first n odd numbers is n^2."

Interactive mode:
```
python query_agent.py
```
Type your queries at the prompt. The CLI will stream the responses and print routing transparency metadata (latency, chosen model, reasoning).

Updating & Rollback

In-Place Updating

If you modify code in router_agent/, you can update the active Reasoning Engine without changing its resource ID:

export REASONING_ENGINE_RESOURCE_NAME="projects/<PROJECT_NUMBER>/locations/us-central1/reasoningEngines/<ENGINE_ID>"
python update.py

Rollback

To rollback, simply re-run the register_ge.sh script pointing to the previous Reasoning Engine resource ID.

Testing Suite

Run tests locally using pytest.

Run unit tests (fast, offline):
```
pytest -m unit
```
Run integration tests (using local InMemoryRunner, mock endpoints):
```
pytest -m "not live"
```
Run live tests (calls real Agent Platform and Model Garden endpoints; requires active internet and ADC authentication):
```
pytest --run-live
```

Evaluation

The routing-accuracy evaluation suite measures classification effectiveness against a golden dataset representing standard enterprise use cases.

Run the evaluation:

python eval/run_eval.py

Metrics & Targets:

Golden Dataset (eval/golden_dataset.jsonl): Seeded with annotated queries for all four tiers.
Routing Accuracy (G-1): Target >= 85% exact-match accuracy.
Opus Routing Precision (G-6): Target >= 80% precision on Opus queries to control costs.

⚠️ Data Retention Policy (Claude Fable 5)

Claude Fable 5 (claude-fable-5) is integrated as the frontier reasoning tier.

Mandatory Retention: Vertex AI enforces a 30-day data retention policy on Fable for safety monitoring.
ZDR Restriction: Zero Data Retention (ZDR) is not supported on Fable.
PII/HIPAA Compliance: Workloads with strict compliance constraints must not use Fable. The compliance guard automatically redirects queries containing sensitive data markers to claude_opus.
Disabling Fable: If 30-day data retention conflicts with your corporate governance policies, disable the claude_fable tier. The router will automatically fallback to claude-opus-4-8 for frontier tasks.

Search Grounding (FR-9, optional, Haiku-only)

Real-time Google Search grounding is disabled by default. If enabled:

Scope: Restrained to the Haiku tier only to optimize performance and prevent tool usage loops in frontier models.
Execution: Grounding uses google_search isolated in a dedicated search_subagent (running Gemini 3.5 Flash). The main Haiku agent executes it as a tool via the documented ADK AgentTool pattern.

Activation:

export ENABLE_SEARCH_GROUNDING=1
export ROUTER_SEARCH_MODEL=gemini-3.5-flash

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
multi_model_router		multi_model_router
.gitignore		.gitignore
LICENSE		LICENSE
PRD.md		PRD.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intelligent Model-Routing Agent (ADK 2.0)

Architecture & Workflow

Routing Stages:

Model Routing Matrix

Key Design Decisions

Setup & Installation

1. Initialize Virtual Environment

2. Install Dependencies

3. Troubleshooting OpenTelemetry Dependency Conflicts

4. GCP & Application Default Credentials (ADC) Setup

Environment Configuration

Deployment

Gemini Enterprise Registration

Interactive Testing & CLI Client

Updating & Rollback

In-Place Updating

Rollback

Testing Suite

Evaluation

Metrics & Targets:

⚠️ Data Retention Policy (Claude Fable 5)

Search Grounding (FR-9, optional, Haiku-only)

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Intelligent Model-Routing Agent (ADK 2.0)

Architecture & Workflow

Routing Stages:

Model Routing Matrix

Key Design Decisions

Setup & Installation

1. Initialize Virtual Environment

2. Install Dependencies

3. Troubleshooting OpenTelemetry Dependency Conflicts

4. GCP & Application Default Credentials (ADC) Setup

Environment Configuration

Deployment

Gemini Enterprise Registration

Interactive Testing & CLI Client

Updating & Rollback

In-Place Updating

Rollback

Testing Suite

Evaluation

Metrics & Targets:

⚠️ Data Retention Policy (Claude Fable 5)

Search Grounding (FR-9, optional, Haiku-only)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages