Skip to content

mbettan/RouteMind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intelligent Model-Routing Agent (ADK 2.0)

A four-tier intelligent routing agent built on the Google Antigravity (AGY) SDK / ADK 2.0 for Gemini Enterprise. It classifies user queries in real-time using Gemini 3.1 Flash-Lite (gemini-3.1-flash-lite) and deterministically dispatches them to Claude Haiku (claude-haiku-4-5), Claude Sonnet (claude-sonnet-4-6), Claude Opus (claude-opus-4-8), or Claude Fable (claude-fable-5) via Model Garden on the Agent Platform.


Architecture & Workflow

The routing agent is defined as a sequential Workflow graph containing guards, a classifier, and a deterministic dispatch engine.

graph TD
    START[User Input] --> CG[context_guard_node]
    CG -->|Extract token length| COMP[compliance_guard_node]
    COMP -->|Check ZDR / Fable Refusal| CLAS[classifier_agent]
    CLAS -->|Structured routing decision| DISP[dispatch_node]
    DISP -->|claude_haiku| HAIKU[claude_haiku_specialist]
    DISP -->|claude_sonnet| SONNET[claude_sonnet_specialist]
    DISP -->|claude_opus| OPUS[claude_opus_specialist]
    DISP -->|claude_fable| FABLE[claude_fable_specialist]
    HAIKU --> END[Response]
    SONNET --> END
    OPUS --> END
    FABLE --> END
Loading

Routing Stages:

  1. context_guard_node: Estimates the text length. If the query exceeds LONG_CONTEXT_ROUTE_THRESHOLD (200,000 tokens), it sets long_context = True in the session state delta.
  2. compliance_guard_node: Checks the text for sensitive keywords (ZDR markers like phi, gdpr, ssn, pii, etc.) and domains that trigger high false-positive safety refusals in Fable (e.g., cybersecurity, medical genetics, malware). It sets zdr_required and fable_refusal flags.
  3. classifier_agent: A Gemini 3.1 Flash-Lite LlmAgent that analyzes query complexity, latency sensitivity, and token volume to output a structured JSON schema (RouteResultSchema) selecting the target tier.
  4. dispatch_node: A deterministic Python function node (not an LLM). If the classifier makes an invalid selection, it defaults to claude_haiku. It applies deterministic overrides:
    • Context Overflow: If long_context = True and the chosen tier was claude_haiku, the route is overridden to claude_sonnet.
    • Compliance/ZDR Safeguards: If zdr_required = True or fable_refusal = True, it overrides routes selecting claude_fable to claude_opus.

Model Routing Matrix

Tier Name Target Model ID Cost (Input/Output per 1M) Primary Typology
Classifier gemini-3.1-flash-lite $0.25 / $1.50 Routing brain & structured classification
claude_haiku claude-haiku-4-5 $1.00 / $5.00 Simple facts, greetings, summaries, short conversations
claude_sonnet claude-sonnet-4-6 $3.30 / $16.50 Code generation, technical writing, document analysis
claude_opus claude-opus-4-8 $5.50 / $27.50 Deep mathematical reasoning, complex logic
claude_fable claude-fable-5 $10.00 / $50.00 Frontier tasks, long-horizon multi-step deliverables

Key Design Decisions

  • Session Path Extraction (FR-18): Gemini Enterprise provides session identifiers as fully qualified resource paths (e.g., projects/.../engines/.../sessions/session_123). Since ADK's built-in session validation regex (^[A-Za-z0-9_-]+$) rejects /, the application extracts the terminal identifier (session_123) at the controller entrypoint boundary before passing it to SessionService. No fragile monkeypatches of internal SDK regexes are used.
  • Native State Updates (FR-19): Graph state changes are set via state_delta on the native run_async / run runner method calls.
  • Self-Test on Startup (FR-21 / RK-1): Introspects Runner.run_async at startup to verify state_delta is supported. Fails fast if the ADK runner contract drifts.
  • Escalation Cascade (FR-13): The harness-level fallback cascade automatically elevates failing queries to the next tier on retryable errors (e.g. 429 rate limits, 5xx, timeouts). Loop safety is guaranteed by bounding the hops to MAX_ESCALATIONS_PER_TURN = 1 (FR-15).

Setup & Installation

Follow these steps to set up the project on your local machine.

1. Initialize Virtual Environment

It is highly recommended to use Python 3.12+ for runtime and dependency alignment.

python3 -m venv .venv
source .venv/bin/activate

2. Install Dependencies

Install the required packages pinned in requirements.txt:

pip install --upgrade pip
pip install -r requirements.txt

3. Troubleshooting OpenTelemetry Dependency Conflicts

google-adk depends on specific versions of OpenTelemetry packages. Mismatches between opentelemetry-api and opentelemetry-sdk will trigger ImportError or installation errors during package resolution.

  • Verification: Ensure both libraries are pinned to the exact same version.
    pip show opentelemetry-api opentelemetry-sdk
  • Resolution: If a version conflict occurs, force install version 1.41.1 (or another aligned version) to resolve the mismatch:
    pip install "opentelemetry-api==1.41.1" "opentelemetry-sdk==1.41.1"
  • Pydantic Warnings: Ensure pydantic==2.13.4 and pydantic-core==2.46.4 are pinned to avoid base-image serialization failures when deploying to Agent Platform (Bug #6041).

4. GCP & Application Default Credentials (ADC) Setup

Ensure you have the Google Cloud SDK installed. Authorize using Application Default Credentials (ADC):

gcloud auth application-default login

Set your target GCP project:

export GOOGLE_CLOUD_PROJECT="YOUR_PROJECT_ID"
export GOOGLE_CLOUD_PROJECT_NUMBER="YOUR_PROJECT_NUMBER"

Environment Configuration

Configure the agent using the following environment variables:

Variable Name Default Value Purpose
ROUTER_CLASSIFIER_MODEL gemini-3.1-flash-lite The LLM used to classify incoming queries.
USE_SONNET_CLASSIFIER 0 (False) Set to 1 to override and use Claude Sonnet as the classifier.
ENABLE_SEARCH_GROUNDING 0 (False) Set to 1 to enable real-time search grounding (Haiku only).
ROUTER_SEARCH_MODEL gemini-3.5-flash The sub-agent model performing Google searches.
OPUS_SHARE_ALERT_THRESHOLD 0.20 Ceiling of Opus traffic share before triggering cost alert logs.
DISABLE_GCP_LOGGING Not set Set to TRUE to disable sending logs to Google Cloud Logging.

Deployment

  1. Deploy to Agent Platform Agent Engine: Run the deployment script. It compiles the ADK app and pushes it to Agent Platform.

    python deploy.py

    Output: This will log the deployed Reasoning Engine resource name. Copy this identifier: projects/<PROJECT_NUMBER>/locations/us-central1/reasoningEngines/<ENGINE_ID>

  2. Save Resource Name: Export the engine ID as an environment variable for testing and updates:

    export REASONING_ENGINE_RESOURCE_NAME="projects/<PROJECT_NUMBER>/locations/us-central1/reasoningEngines/<ENGINE_ID>"

Gemini Enterprise Registration

Register the deployed Reasoning Engine in Gemini Enterprise via the Discovery Engine API:

export APP_ID="your_gemini_enterprise_app_id"
export REASONING_ENGINE="projects/<PROJECT_NUMBER>/locations/us-central1/reasoningEngines/<ENGINE_ID>"
bash register_ge.sh

Verify that the agent appears in your Gemini Enterprise UI dashboard.


Interactive Testing & CLI Client

You can run queries against the deployed Agent Platform engine using query_agent.py.

  • Single-query mode:
    python query_agent.py "Prove that the sum of the first n odd numbers is n^2."
  • Interactive mode:
    python query_agent.py
    Type your queries at the prompt. The CLI will stream the responses and print routing transparency metadata (latency, chosen model, reasoning).

Updating & Rollback

In-Place Updating

If you modify code in router_agent/, you can update the active Reasoning Engine without changing its resource ID:

export REASONING_ENGINE_RESOURCE_NAME="projects/<PROJECT_NUMBER>/locations/us-central1/reasoningEngines/<ENGINE_ID>"
python update.py

Rollback

To rollback, simply re-run the register_ge.sh script pointing to the previous Reasoning Engine resource ID.


Testing Suite

Run tests locally using pytest.

  • Run unit tests (fast, offline):
    pytest -m unit
  • Run integration tests (using local InMemoryRunner, mock endpoints):
    pytest -m "not live"
  • Run live tests (calls real Agent Platform and Model Garden endpoints; requires active internet and ADC authentication):
    pytest --run-live

Evaluation

The routing-accuracy evaluation suite measures classification effectiveness against a golden dataset representing standard enterprise use cases.

Run the evaluation:

python eval/run_eval.py

Metrics & Targets:

  • Golden Dataset (eval/golden_dataset.jsonl): Seeded with annotated queries for all four tiers.
  • Routing Accuracy (G-1): Target >= 85% exact-match accuracy.
  • Opus Routing Precision (G-6): Target >= 80% precision on Opus queries to control costs.

⚠️ Data Retention Policy (Claude Fable 5)

Claude Fable 5 (claude-fable-5) is integrated as the frontier reasoning tier.

  • Mandatory Retention: Vertex AI enforces a 30-day data retention policy on Fable for safety monitoring.
  • ZDR Restriction: Zero Data Retention (ZDR) is not supported on Fable.
  • PII/HIPAA Compliance: Workloads with strict compliance constraints must not use Fable. The compliance guard automatically redirects queries containing sensitive data markers to claude_opus.
  • Disabling Fable: If 30-day data retention conflicts with your corporate governance policies, disable the claude_fable tier. The router will automatically fallback to claude-opus-4-8 for frontier tasks.

Search Grounding (FR-9, optional, Haiku-only)

Real-time Google Search grounding is disabled by default. If enabled:

  • Scope: Restrained to the Haiku tier only to optimize performance and prevent tool usage loops in frontier models.
  • Execution: Grounding uses google_search isolated in a dedicated search_subagent (running Gemini 3.5 Flash). The main Haiku agent executes it as a tool via the documented ADK AgentTool pattern.
  • Activation:
    export ENABLE_SEARCH_GROUNDING=1
    export ROUTER_SEARCH_MODEL=gemini-3.5-flash

About

A robust routing architecture built on the Google Agent Development Kit (ADK 2.0) and deployed to Vertex AI Reasoning Engine (Agent Engine). It uses a lightweight Gemini 3.1 Flash-Lite classifier to dynamically route user queries to the optimal Claude specialist tier (Haiku 4.5, Sonnet 4.6, Opus 4.8, or Fable 5).

Resources

License

Stars

Watchers

Forks

Contributors