A four-tier intelligent routing agent built on the Google Antigravity (AGY) SDK / ADK 2.0 for Gemini Enterprise. It classifies user queries in real-time using Gemini 3.1 Flash-Lite (gemini-3.1-flash-lite) and deterministically dispatches them to Claude Haiku (claude-haiku-4-5), Claude Sonnet (claude-sonnet-4-6), Claude Opus (claude-opus-4-8), or Claude Fable (claude-fable-5) via Model Garden on the Agent Platform.
The routing agent is defined as a sequential Workflow graph containing guards, a classifier, and a deterministic dispatch engine.
graph TD
START[User Input] --> CG[context_guard_node]
CG -->|Extract token length| COMP[compliance_guard_node]
COMP -->|Check ZDR / Fable Refusal| CLAS[classifier_agent]
CLAS -->|Structured routing decision| DISP[dispatch_node]
DISP -->|claude_haiku| HAIKU[claude_haiku_specialist]
DISP -->|claude_sonnet| SONNET[claude_sonnet_specialist]
DISP -->|claude_opus| OPUS[claude_opus_specialist]
DISP -->|claude_fable| FABLE[claude_fable_specialist]
HAIKU --> END[Response]
SONNET --> END
OPUS --> END
FABLE --> END
context_guard_node: Estimates the text length. If the query exceedsLONG_CONTEXT_ROUTE_THRESHOLD(200,000 tokens), it setslong_context = Truein the session state delta.compliance_guard_node: Checks the text for sensitive keywords (ZDR markers likephi,gdpr,ssn,pii, etc.) and domains that trigger high false-positive safety refusals in Fable (e.g.,cybersecurity,medical genetics,malware). It setszdr_requiredandfable_refusalflags.classifier_agent: A Gemini 3.1 Flash-LiteLlmAgentthat analyzes query complexity, latency sensitivity, and token volume to output a structured JSON schema (RouteResultSchema) selecting the target tier.dispatch_node: A deterministic Python function node (not an LLM). If the classifier makes an invalid selection, it defaults toclaude_haiku. It applies deterministic overrides:- Context Overflow: If
long_context = Trueand the chosen tier wasclaude_haiku, the route is overridden toclaude_sonnet. - Compliance/ZDR Safeguards: If
zdr_required = Trueorfable_refusal = True, it overrides routes selectingclaude_fabletoclaude_opus.
- Context Overflow: If
| Tier Name | Target Model ID | Cost (Input/Output per 1M) | Primary Typology |
|---|---|---|---|
| Classifier | gemini-3.1-flash-lite |
$0.25 / $1.50 | Routing brain & structured classification |
claude_haiku |
claude-haiku-4-5 |
$1.00 / $5.00 | Simple facts, greetings, summaries, short conversations |
claude_sonnet |
claude-sonnet-4-6 |
$3.30 / $16.50 | Code generation, technical writing, document analysis |
claude_opus |
claude-opus-4-8 |
$5.50 / $27.50 | Deep mathematical reasoning, complex logic |
claude_fable |
claude-fable-5 |
$10.00 / $50.00 | Frontier tasks, long-horizon multi-step deliverables |
- Session Path Extraction (FR-18): Gemini Enterprise provides session identifiers as fully qualified resource paths (e.g.,
projects/.../engines/.../sessions/session_123). Since ADK's built-in session validation regex (^[A-Za-z0-9_-]+$) rejects/, the application extracts the terminal identifier (session_123) at the controller entrypoint boundary before passing it toSessionService. No fragile monkeypatches of internal SDK regexes are used. - Native State Updates (FR-19): Graph state changes are set via
state_deltaon the nativerun_async/runrunner method calls. - Self-Test on Startup (FR-21 / RK-1): Introspects
Runner.run_asyncat startup to verifystate_deltais supported. Fails fast if the ADK runner contract drifts. - Escalation Cascade (FR-13): The harness-level fallback cascade automatically elevates failing queries to the next tier on retryable errors (e.g. 429 rate limits, 5xx, timeouts). Loop safety is guaranteed by bounding the hops to
MAX_ESCALATIONS_PER_TURN = 1(FR-15).
Follow these steps to set up the project on your local machine.
It is highly recommended to use Python 3.12+ for runtime and dependency alignment.
python3 -m venv .venv
source .venv/bin/activateInstall the required packages pinned in requirements.txt:
pip install --upgrade pip
pip install -r requirements.txtgoogle-adk depends on specific versions of OpenTelemetry packages. Mismatches between opentelemetry-api and opentelemetry-sdk will trigger ImportError or installation errors during package resolution.
- Verification: Ensure both libraries are pinned to the exact same version.
pip show opentelemetry-api opentelemetry-sdk
- Resolution: If a version conflict occurs, force install version
1.41.1(or another aligned version) to resolve the mismatch:pip install "opentelemetry-api==1.41.1" "opentelemetry-sdk==1.41.1"
- Pydantic Warnings: Ensure
pydantic==2.13.4andpydantic-core==2.46.4are pinned to avoid base-image serialization failures when deploying to Agent Platform (Bug #6041).
Ensure you have the Google Cloud SDK installed. Authorize using Application Default Credentials (ADC):
gcloud auth application-default loginSet your target GCP project:
export GOOGLE_CLOUD_PROJECT="YOUR_PROJECT_ID"
export GOOGLE_CLOUD_PROJECT_NUMBER="YOUR_PROJECT_NUMBER"Configure the agent using the following environment variables:
| Variable Name | Default Value | Purpose |
|---|---|---|
ROUTER_CLASSIFIER_MODEL |
gemini-3.1-flash-lite |
The LLM used to classify incoming queries. |
USE_SONNET_CLASSIFIER |
0 (False) |
Set to 1 to override and use Claude Sonnet as the classifier. |
ENABLE_SEARCH_GROUNDING |
0 (False) |
Set to 1 to enable real-time search grounding (Haiku only). |
ROUTER_SEARCH_MODEL |
gemini-3.5-flash |
The sub-agent model performing Google searches. |
OPUS_SHARE_ALERT_THRESHOLD |
0.20 |
Ceiling of Opus traffic share before triggering cost alert logs. |
DISABLE_GCP_LOGGING |
Not set | Set to TRUE to disable sending logs to Google Cloud Logging. |
-
Deploy to Agent Platform Agent Engine: Run the deployment script. It compiles the ADK app and pushes it to Agent Platform.
python deploy.py
Output: This will log the deployed Reasoning Engine resource name. Copy this identifier:
projects/<PROJECT_NUMBER>/locations/us-central1/reasoningEngines/<ENGINE_ID> -
Save Resource Name: Export the engine ID as an environment variable for testing and updates:
export REASONING_ENGINE_RESOURCE_NAME="projects/<PROJECT_NUMBER>/locations/us-central1/reasoningEngines/<ENGINE_ID>"
Register the deployed Reasoning Engine in Gemini Enterprise via the Discovery Engine API:
export APP_ID="your_gemini_enterprise_app_id"
export REASONING_ENGINE="projects/<PROJECT_NUMBER>/locations/us-central1/reasoningEngines/<ENGINE_ID>"
bash register_ge.shVerify that the agent appears in your Gemini Enterprise UI dashboard.
You can run queries against the deployed Agent Platform engine using query_agent.py.
- Single-query mode:
python query_agent.py "Prove that the sum of the first n odd numbers is n^2." - Interactive mode:
Type your queries at the prompt. The CLI will stream the responses and print routing transparency metadata (latency, chosen model, reasoning).
python query_agent.py
If you modify code in router_agent/, you can update the active Reasoning Engine without changing its resource ID:
export REASONING_ENGINE_RESOURCE_NAME="projects/<PROJECT_NUMBER>/locations/us-central1/reasoningEngines/<ENGINE_ID>"
python update.pyTo rollback, simply re-run the register_ge.sh script pointing to the previous Reasoning Engine resource ID.
Run tests locally using pytest.
- Run unit tests (fast, offline):
pytest -m unit
- Run integration tests (using local InMemoryRunner, mock endpoints):
pytest -m "not live" - Run live tests (calls real Agent Platform and Model Garden endpoints; requires active internet and ADC authentication):
pytest --run-live
The routing-accuracy evaluation suite measures classification effectiveness against a golden dataset representing standard enterprise use cases.
Run the evaluation:
python eval/run_eval.py- Golden Dataset (
eval/golden_dataset.jsonl): Seeded with annotated queries for all four tiers. - Routing Accuracy (G-1): Target >= 85% exact-match accuracy.
- Opus Routing Precision (G-6): Target >= 80% precision on Opus queries to control costs.
Claude Fable 5 (claude-fable-5) is integrated as the frontier reasoning tier.
- Mandatory Retention: Vertex AI enforces a 30-day data retention policy on Fable for safety monitoring.
- ZDR Restriction: Zero Data Retention (ZDR) is not supported on Fable.
- PII/HIPAA Compliance: Workloads with strict compliance constraints must not use Fable. The compliance guard automatically redirects queries containing sensitive data markers to
claude_opus. - Disabling Fable: If 30-day data retention conflicts with your corporate governance policies, disable the
claude_fabletier. The router will automatically fallback toclaude-opus-4-8for frontier tasks.
Real-time Google Search grounding is disabled by default. If enabled:
- Scope: Restrained to the Haiku tier only to optimize performance and prevent tool usage loops in frontier models.
- Execution: Grounding uses
google_searchisolated in a dedicatedsearch_subagent(running Gemini 3.5 Flash). The main Haiku agent executes it as a tool via the documented ADKAgentToolpattern. - Activation:
export ENABLE_SEARCH_GROUNDING=1 export ROUTER_SEARCH_MODEL=gemini-3.5-flash