Skip to content

feat(router): intelligent cost-aware + latency-aware routing#414

Open
Barsit wants to merge 1 commit into
ENTERPILOT:mainfrom
Barsit:feat/intelligent-routing
Open

feat(router): intelligent cost-aware + latency-aware routing#414
Barsit wants to merge 1 commit into
ENTERPILOT:mainfrom
Barsit:feat/intelligent-routing

Conversation

@Barsit

@Barsit Barsit commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

Summary

When multiple providers serve the same model ID, the gateway now proactively selects the best provider based on a weighted cost + latency score — instead of the previous deterministic first-wins strategy where the first registered provider always handled the request.

Architecture

Request: model="gpt-4o" (no provider pinned)
         |
         v
    resolveProvider()
         |
         +- providerHint != "" --> exact route (existing logic, unchanged)
         |
         +- providerHint == "" --> collectCandidates(modelID)
                                       |
                                       v
                                  StrategyRegistry.Resolve(ctx)
                                       |
                                       v
                                  RoutingStrategy.Select()
                                       |
                                       v
                                  Best candidate --> route

New components (8 files)

Routing strategies under internal/router/:

Component File Responsibility
RoutingStrategy interface strategy.go Pluggable strategy + ProviderCandidate
BalancedStrategy strategy_balanced.go Weighted cost+latency scoring (default)
CostOnlyStrategy strategy_cost.go Pure cost selection
LatencyOnlyStrategy strategy_latency.go Pure latency selection
FirstFitStrategy strategy_firstfit.go First acceptable (historical compat)
Scoring helpers scoring.go Normalization, filtering, tie-breaking
StrategyRegistry registry.go Name to factory map + override resolution
Context helpers context.go Context key helpers

Supporting:

Package File Responsibility
internal/llmclient/ latency.go EWMA latency/error tracker
internal/core/ interfaces.go (+ProviderStats) Runtime stats interface
internal/server/ routing_strategy.go X-GoModel-Routing-* header capture
config/ router.go Router config struct + validation

Modified components

File Change
llmclient/client.go Integrate LatencyTracker, implement ProviderStats
providers/registry.go Add ListProvidersForModel (O(k), registration-order)
providers/router.go Insert strategy selection into resolveProvider
providers/init.go Wire strategy registry at startup
providers/strategy_init.go Build registry from RouterConfig
gateway/inference_prepare.go Set RoutingEligible context flag
config/config.go Add RouterConfig field
config/config.example.yaml Document router section
.env.template Document MODEL_ROUTING_* env vars
CLAUDE.md Document intelligent routing summary

Key design decisions

  1. Default balanced strategy (cost 0.6 + latency 0.4). Single-provider configs are unaffected (one candidate, no decision needed)
  2. Missing data scores 1.0 (worst), so complete-data providers are preferred
  3. Ties broken by ProviderName lexicographic order for determinism
  4. Three single-purpose strategies: cost_only, latency_only, first_fit
  5. Per-request header overrides: X-GoModel-Routing-Strategy and X-GoModel-Routing-Weights; invalid values silently fall back (WARN log)
  6. Explicit provider bypasses routing: openai-east/gpt-4o or req.Provider non-empty leads to exact route, no strategy involvement

Bugs fixed during development

  1. Mutual recursion: cost/latency single-factor strategies recursed infinitely when both dimensions were absent. Fixed by computing the fallback dimension's score inline
  2. Free model mis-classification: normalizeCost treated cost=0 as unknown. Fixed by returning -1 sentinel from perMtokCost when pricing is absent
  3. Production path never executed strategy: PrepareChatRequest set req.Provider via ResolveModel, then resolveProvider skipped strategy when providerHint was non-empty. Fixed via RoutingEligible context flag set during prepare for unqualified requests
  4. FirstFit order: used alpha-sorted candidates instead of registration order. Fixed by implementing ListProvidersForModel returning entries in registration order

Testing

  • 38 unit tests (router + llmclient): strategy selection logic, edge cases (empty candidates, open circuits, error rate boundary, equal scores, free model vs unknown, tiebreaks), registry resolution, EWMA convergence
  • 8 integration tests (providers): cheaper/faster selection, header override, invalid override fallback, explicit provider bypass, single-candidate no-op, weights override, FirstFit registration order
  • lint: 0 issues, full project build passes
  • All new tests + existing Router tests pass

Summary by CodeRabbit

  • New Features
    • Added intelligent model-to-provider routing with selectable strategies (balanced, cost-only, latency-only, first-fit) using pricing/latency plus error-rate & circuit status.
    • Support per-request routing overrides and request-level weights; routing respects explicit provider pinning and provider-qualified model syntax.
  • Configuration & Docs
    • Added router: configuration and documented the related environment/header options; updated example configuration and templates.
  • Tests
    • Added unit and integration tests covering selection, overrides, filtering, and ordering.

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds an intelligent routing subsystem to the model gateway. When a request does not pin a provider, a configurable strategy (balanced, cost_only, latency_only, first_fit) scores available provider candidates using EWMA-tracked latency/error-rate metrics and model pricing, then routes to the best candidate. Per-request strategy and weight overrides are accepted via HTTP headers.

Changes

Intelligent Routing for Multi-Provider Gateway

Layer / File(s) Summary
Router core contracts
internal/router/strategy.go, internal/core/interfaces.go, internal/router/context.go
Defines ProviderCandidate, RoutingStrategy interface, optional ProviderStats interface (P50/P99/ErrorRate/CircuitState), and context helpers for routing eligibility and per-request strategy/weight overrides.
EWMA latency tracking and client stats
internal/llmclient/latency.go, internal/llmclient/client.go, internal/llmclient/latency_test.go
EWMA-based LatencyTracker tracks P50, P99, and error rate; integrated into Client construction and completeScope recording; P50Latency/P99Latency/ErrorRate/CircuitState methods satisfy ProviderStats. Unit and concurrency tests included.
Candidate scoring and ranking
internal/router/scoring.go
Implements cost and latency normalization into [0,1] scores, circuit/error-rate gating via acceptable/isCircuitOpen, rankBy/pickBest with deterministic ProviderName tie-breaking, and ErrNoAcceptableCandidate.
Four strategy implementations and unit tests
internal/router/strategy_balanced.go, strategy_cost.go, strategy_latency.go, strategy_firstfit.go, internal/router/strategy_test.go
Implements BalancedStrategy, CostOnlyStrategy, LatencyOnlyStrategy, and FirstFitStrategy each with data-availability fallbacks. Full unit-test coverage for all strategies, selection rules, fallback paths, boundary filtering, deterministic ties, and default weighting.
StrategyRegistry, config types, and validation
internal/router/registry.go, internal/router/registry_test.go, config/router.go, config/config.go
StrategyRegistry stores factories, resolves context overrides, exposes ParseWeights. RouterConfig/RouterWeights types with constants and DefaultRouterConfig; ValidateRouterConfig normalizes strategy, parses WeightsCSV, validates MaxErrorRate. Config.Router field added; Load() validates on startup. Comprehensive registry and config tests.
Provider registry ordering and strategy wiring
internal/providers/registry.go, internal/providers/strategy_init.go, internal/providers/init.go
ModelRegistry gains providerRegistrationOrder and ListProvidersForModel for ordered candidate enumeration. buildStrategyRegistry constructs a populated StrategyRegistry from RouterConfig; wired into NewRouter via WithStrategyRegistry option.
Request eligibility marking and provider router orchestration
internal/gateway/inference_prepare.go, internal/providers/router.go
inference_prepare.go sets WithRoutingEligible when no provider is pinned and model has no provider prefix. providers/router.go adds applyStrategy orchestration, collectCandidates/buildCandidatesFromEntries (with ProviderStats population), and applyWeightsOverride for context-driven weight override handling.
HTTP middleware and integration tests
internal/server/routing_strategy.go, internal/server/http.go, internal/providers/router_strategy_integration_test.go
RoutingStrategyCapture middleware reads X-GoModel-Routing-Strategy and X-GoModel-Routing-Weights, injects overrides into context, WARN-logs invalid weights, and is registered before workflow resolution. Integration tests validate cheaper/faster selection, override behavior, explicit provider bypass, weight overrides, model-syntax pinning, and first-fit ordering.
Documentation and repository metadata
.env.template, config/config.example.yaml, CLAUDE.md, .idea/*, .gitignore, coverage-bailian
Updates intelligent routing documentation/templates; adds IDE project metadata files; updates .gitignore; updates coverage profile artifact.

Sequence Diagram

sequenceDiagram
  actor Client
  participant Middleware as RoutingStrategyCapture
  participant Prepare as inference_prepare.go
  participant PRouter as providers/router.go
  participant StratReg as StrategyRegistry
  participant LLMClient as llmclient.Client

  Client->>Middleware: HTTP request (optional X-GoModel-Routing-Strategy/Weights headers)
  Middleware->>Middleware: parse headers, apply WithStrategyOverride / WithWeightsOverride to ctx
  Middleware->>Prepare: next(ctx)
  Prepare->>Prepare: detect no provider pin → WithRoutingEligible(ctx)
  Prepare->>PRouter: ResolveModel(ctx, modelID)
  PRouter->>PRouter: IsRoutingEligible(ctx)?
  PRouter->>PRouter: collectCandidates → buildCandidatesFromEntries
  Note over PRouter,LLMClient: ProviderStats (P50/P99/ErrorRate/CircuitState) read from each LLMClient
  PRouter->>StratReg: Resolve(ctx) → strategy (override or default)
  PRouter->>PRouter: applyWeightsOverride (clone BalancedStrategy if applicable)
  PRouter->>StratReg: strategy.Select(ctx, candidates)
  StratReg-->>PRouter: *ProviderCandidate (best scored)
  PRouter-->>Prepare: resolved provider + selector
  Prepare-->>Client: routed request forwarded to best provider
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 Hippity hop, which provider today?
Cost and latency mapped to arrays!
EWMA smooths the bumps in the road,
Balanced or first-fit—the gateway's new mode.
X-GoModel-Routing-Strategy in the header? Jolly good!
The best provider chosen, just as it should. 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 48.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main feature: intelligent cost-aware and latency-aware routing for multi-provider model endpoints, which is the primary focus of all 40+ file changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps

greptile-apps Bot commented Jun 20, 2026

Copy link
Copy Markdown

Confidence Score: 3/5

Several core routing paths can choose the wrong provider and should be fixed before merging.

  • Slash-containing model IDs can miss the new routing path.
  • Cost-only routing can prefer unknown pricing over known-free pricing.
  • Audio and realtime requests can be routed to providers that do not support those endpoints.
  • The zero error-rate threshold is accepted in config but not applied at runtime.

internal/gateway/inference_prepare.go, internal/router/strategy_cost.go, internal/providers/router.go, config/router.go, internal/llmclient/latency.go

Reviews (1): Last reviewed commit: "feat(router): add intelligent cost-aware..." | Re-trigger Greptile

Comment on lines +234 to +242
func hasProviderInModel(model *string) bool {
if model == nil {
return false
}
m := strings.TrimSpace(*model)
firstSlash := strings.IndexByte(m, '/')
// A provider prefix is a non-empty segment before a slash that is NOT itself
// a plain model-ID containing a slash (e.g. "meta-llama/Llama-3-70b").
return firstSlash > 0 && firstSlash < len(m)-1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Slash models bypass routing

This treats any model containing a slash as provider-pinned, but valid unqualified model IDs can also contain slashes, such as meta-llama/Meta-Llama-3-70B. When such a request has an empty Provider, prepare skips WithRoutingEligible, then ApplyResolvedSelector stamps the first resolved provider into the request. The execution path then sees a non-empty provider hint and skips intelligent routing, so these multi-provider models keep the old first-wins behavior.

Context Used: CLAUDE.md (source)

Comment thread internal/router/strategy_cost.go Outdated
Comment on lines +33 to +37
if _, max := minMaxCost(candidates); max == 0 {
latMin, latMax := minMaxLatency(candidates)
scores := normalizeLatency(latencyValues(candidates), latMin, latMax)
return rankBy(candidates, scores, maxErr)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Free pricing degrades

This branch uses max == 0 to mean that no provider has pricing, but known-free models also produce a max cost of zero. With a free but slower provider and an unknown-priced faster provider, cost_only falls into latency ranking and can choose the unknown-priced provider over the known-free one. That makes the cost-only strategy ignore the very pricing sentinel the scoring code uses to distinguish free from unknown.

Comment thread internal/providers/router.go Outdated
Comment on lines +315 to +320
eligible := strings.TrimSpace(providerHint) == "" || router.IsRoutingEligible(ctx)
if eligible && r.strategyRegistry != nil {
if chosen, newSelector, ok := r.applyStrategy(ctx, selector); ok {
selector = newSelector
p = chosen
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Capability checks happen late

Intelligent routing runs before endpoint capability checks, so an audio or realtime request can be rerouted to a cheaper or faster provider that serves the same model ID but does not implement the required optional interface. For example, CreateSpeech resolves through this path, then routeAudioCall type-asserts core.AudioProvider afterwards and returns model ... does not support audio operations. A model that worked through the first audio-capable provider can start failing because routing selected a provider that is only valid for other endpoints.

Context Used: CLAUDE.md (source)

Comment thread config/router.go
Comment on lines +86 to +88
if cfg.MaxErrorRate < 0 || cfg.MaxErrorRate > 1 {
return fmt.Errorf("router.max_error_rate must be in [0, 1], got %v", cfg.MaxErrorRate)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Zero threshold ignored

The config validator accepts router.max_error_rate: 0 as a valid value, but each strategy later treats MaxErrorRate == 0 as “use the default 0.5”. An operator who configures zero to block any provider with a nonzero smoothed error rate will silently get the default threshold instead, so providers with error rates below 0.5 can still receive traffic.

Context Used: CLAUDE.md (source)

Comment on lines +77 to +82
}

// Record observes the duration and error status of a completed request.
// duration is the total request latency; isError indicates whether the
// request was considered failed for routing purposes.
func (t *LatencyTracker) Record(duration time.Duration, isError bool) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Submillisecond samples vanish

duration.Milliseconds() truncates any latency below 1ms to zero, and the routing strategies treat zero latency as unknown data. For a very fast local or in-cluster provider, latency-only routing can therefore decide that no latency data exists and fall back to cost ranking, picking a slower provider even though the fast provider has real samples.

@codecov-commenter

codecov-commenter commented Jun 20, 2026

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@config/router.go`:
- Around line 80-88: The current config validation for router weights and
max_error_rate checks for negative and out-of-range values, but does not reject
NaN or other non-finite numbers that strconv.ParseFloat can produce. Add checks
using math.IsNaN() or math.IsInf() (or a helper function) to validate that
cfg.Weights.Cost, cfg.Weights.Latency, and cfg.MaxErrorRate are finite numbers.
Include these finite-value checks alongside the existing negative and range
validation checks in the error return statements, and apply the same validation
logic to the similar checks mentioned at lines 110-115.

In `@internal/gateway/inference_prepare.go`:
- Around line 231-243: The hasProviderInModel function currently returns true
for any model string containing a slash, but it should distinguish between
actual provider prefixes like "deepseek/deepseek-v4-flash" and plain model IDs
containing slashes like "meta-llama/Llama-3-70b". Update the return statement to
add additional logic beyond the current firstSlash check to properly identify
whether the segment before the slash is an actual provider prefix or just part
of a model ID name. This will ensure plain model IDs with slashes are not
incorrectly marked as provider-pinned and can properly use intelligent routing.

In `@internal/llmclient/latency_test.go`:
- Around line 28-37: Add a new regression test function (e.g.,
TestLatencyTracker_LeadingSuccessesDoNotSkewErrorRate) after the existing
TestLatencyTracker_ErrorRateReflectsFailures test that verifies ErrorRate()
behavior when multiple successful samples are recorded before a single failure
occurs. The test should create a NewLatencyTracker, record several successful
samples by calling Record with appropriate parameters indicating success (false
or 0), then record one failure, and finally assert that ErrorRate() returns a
value that does not spike to 1.0 but remains below the routing cutoff threshold,
demonstrating that a single failure after many successes does not incorrectly
skew the error rate calculation.

In `@internal/llmclient/latency.go`:
- Around line 15-19: The ewma struct uses e.value == 0 to detect the first
sample, which incorrectly handles valid zero-valued samples (such as error rate
of 0 for successful requests). Add a boolean field to the ewma struct to
explicitly track whether the EWMA has been initialized. Then update any methods
that check e.value == 0 as an initialization condition (mentioned in lines
30-38) to instead check this new boolean flag. This ensures the first sample is
properly initialized regardless of its value.

In `@internal/providers/registry.go`:
- Around line 717-725: The ListProvidersForModel method documentation states it
returns an empty slice when the model is not found, but the current
implementation returns nil instead. Replace the nil return statement in the
early exit condition (when modelID is empty or no providers are registered) with
an empty slice of type ModelWithProvider to honor the documented API contract
and eliminate nil-shape ambiguity.

In `@internal/providers/router_strategy_integration_test.go`:
- Around line 176-202: The test
TestRouter_IntelligentRouting_ExplicitProviderHintBypasses currently only
validates provider pinning through the explicit Provider field, but does not
cover provider pinning through the model syntax format. Add a second
ChatCompletion call within the same test that uses Model: "openai-west/gpt-4o"
with Provider left empty to verify that explicit provider pins specified via the
model-qualified syntax are also respected by the routing strategy and not
overridden based on cost. Assert that the response ID is still "west-resp" to
confirm the expensive provider is selected when pinned through the model syntax,
ensuring the strategy does not bypass such explicit provider qualifications.

In `@internal/providers/router.go`:
- Around line 309-317: The eligible variable calculation on line 315 only checks
if providerHint is empty but fails to account for explicit provider
specifications using the provider/model syntax. When a user pins a provider by
including it in the model string (e.g., "provider/model"), that explicit choice
should prevent re-routing through the strategy registry. Modify the eligible
variable assignment to also check whether the model or selector contains an
explicit provider specification, and only mark requests as eligible for
re-routing when no provider is pinned either through providerHint or through the
provider/model syntax in the selector.

In `@internal/router/context.go`:
- Around line 35-46: The WithStrategyOverride function currently stores empty
strings in the context, but the documentation states that empty names should
clear any prior override. Modify the WithStrategyOverride function to check if
the strategy parameter is empty and, if so, delete the value from the context
(or return the original context unchanged) rather than storing an empty string.
This ensures that StrategyOverrideFromContext returns the expected (empty
string, false) when no override is set, matching the documented behavior.

In `@internal/router/registry_test.go`:
- Around line 68-75: The TestParseWeights_Invalid test function is missing
coverage for edge cases like trailing garbage and non-finite float values.
Expand the cases slice in the TestParseWeights_Invalid function to include
additional invalid weight strings such as strings with trailing garbage (e.g.,
"0.6x,0.4"), and non-finite float representations (e.g., "NaN,0.4", "Inf,0.4").
This ensures the ParseWeights function properly handles and rejects these
malformed inputs as part of its error handling path, making the parser behavior
regression-proof.

In `@internal/router/registry.go`:
- Around line 98-106: The weight parsing logic in the function that handles Cost
and Latency weights uses fmt.Sscanf which has two issues: it accepts non-finite
values like NaN and Inf, and it performs only partial input matching (e.g.,
"0.6foo" would be parsed as 0.6). To fix this, after parsing each weight value
with fmt.Sscanf for both Cost and Latency, verify that the entire trimmed string
was consumed by the parse operation and reject any non-finite values (NaN or
Inf) using math.IsNaN and math.IsInf checks. Add these validation checks
immediately after each fmt.Sscanf call before the existing non-negative
validation.

In `@internal/router/strategy_balanced.go`:
- Around line 39-45: The zero-value defaulting logic for balanced weights in the
strategy initialization is incorrect because the first condition modifies costW
to 0.6, which then causes the second condition to fail since it checks if costW
== 0. Fix this by changing the second condition from checking costW == 0 to
checking s.CostWeight == 0 instead, so it evaluates the original unmodified
value. This ensures that when both s.CostWeight and s.LatencyWeight are zero,
both costW and latW are properly set to their default values of 0.6 and 0.4
respectively.

In `@internal/router/strategy_cost.go`:
- Around line 33-37: The current implementation in the pricing strategy uses
`max == 0` from minMaxCost() as a sentinel to detect missing pricing data, but
this incorrectly treats legitimate zero-cost pricing as absent pricing and
degrades to latency-only ranking. Create a new helper function (e.g.,
hasPricingData) that directly checks whether pricing information is actually
present for the candidates, rather than inferring presence from numeric extrema.
Replace the condition `if _, max := minMaxCost(candidates); max == 0` with a
call to this new helper function to properly distinguish between zero cost and
missing cost data.

In `@internal/router/strategy_test.go`:
- Around line 115-124: The TestCostOnly_DegradesToLatencyWhenNoPricing test can
still pass via lexicographic tie-break even if the degradation fallback logic
regresses, because both the correct degradation and the lexicographic ordering
would select the same provider. Change the provider names in this test (and the
similar degradation test mentioned in the also-applies-to comment around line
139-148) so that the provider with better latency/lower cost comes
lexicographically after the other one, making fallback and tie-break produce
different winners and ensuring the test truly validates the degradation logic.
Additionally, add a new test for zero-value BalancedStrategy that verifies it
correctly defaults to 0.6/0.4 weight distribution when initialized without
explicit configuration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8563b06f-d85e-4bbe-8889-05a534a1c750

📥 Commits

Reviewing files that changed from the base of the PR and between 2677c1f and 086564a.

📒 Files selected for processing (27)
  • .env.template
  • CLAUDE.md
  • config/config.example.yaml
  • config/config.go
  • config/router.go
  • internal/core/interfaces.go
  • internal/gateway/inference_prepare.go
  • internal/llmclient/client.go
  • internal/llmclient/latency.go
  • internal/llmclient/latency_test.go
  • internal/providers/init.go
  • internal/providers/registry.go
  • internal/providers/router.go
  • internal/providers/router_strategy_integration_test.go
  • internal/providers/strategy_init.go
  • internal/router/context.go
  • internal/router/registry.go
  • internal/router/registry_test.go
  • internal/router/scoring.go
  • internal/router/strategy.go
  • internal/router/strategy_balanced.go
  • internal/router/strategy_cost.go
  • internal/router/strategy_firstfit.go
  • internal/router/strategy_latency.go
  • internal/router/strategy_test.go
  • internal/server/http.go
  • internal/server/routing_strategy.go

Comment thread config/router.go
Comment on lines +231 to +243
// hasProviderInModel reports whether the model string contains a provider
// prefix (e.g. "deepseek/deepseek-v4-flash"), which means the user explicitly
// chose a provider via the model syntax even though req.Provider is empty.
func hasProviderInModel(model *string) bool {
if model == nil {
return false
}
m := strings.TrimSpace(*model)
firstSlash := strings.IndexByte(m, '/')
// A provider prefix is a non-empty segment before a slash that is NOT itself
// a plain model-ID containing a slash (e.g. "meta-llama/Llama-3-70b").
return firstSlash > 0 && firstSlash < len(m)-1
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

hasProviderInModel misclassifies slash-style model IDs as provider-pinned.

Line 239 treats any x/y as provider-qualified, so plain model IDs containing / are marked as pinned and never get WithRoutingEligible. That skips intelligent routing for valid unpinned requests.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/gateway/inference_prepare.go` around lines 231 - 243, The
hasProviderInModel function currently returns true for any model string
containing a slash, but it should distinguish between actual provider prefixes
like "deepseek/deepseek-v4-flash" and plain model IDs containing slashes like
"meta-llama/Llama-3-70b". Update the return statement to add additional logic
beyond the current firstSlash check to properly identify whether the segment
before the slash is an actual provider prefix or just part of a model ID name.
This will ensure plain model IDs with slashes are not incorrectly marked as
provider-pinned and can properly use intelligent routing.

Comment on lines +28 to +37
func TestLatencyTracker_ErrorRateReflectsFailures(t *testing.T) {
tr := NewLatencyTracker()
for i := 0; i < 5; i++ {
tr.Record(10*time.Millisecond, i%2 == 0) // 3 errors of 5
}
rate := tr.ErrorRate()
if rate < 0.3 || rate > 0.9 {
t.Fatalf("expected error rate near 0.4-0.6, got %v", rate)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add a regression test for leading-success error-rate initialization.

Please add a case where several successful samples precede one failure; ErrorRate() should remain below the routing cutoff after a single failure, not jump to 1.0.

Suggested test
 func TestLatencyTracker_ErrorRateReflectsFailures(t *testing.T) {
 	tr := NewLatencyTracker()
 	for i := 0; i < 5; i++ {
 		tr.Record(10*time.Millisecond, i%2 == 0) // 3 errors of 5
 	}
 	rate := tr.ErrorRate()
 	if rate < 0.3 || rate > 0.9 {
 		t.Fatalf("expected error rate near 0.4-0.6, got %v", rate)
 	}
 }
+
+func TestLatencyTracker_ErrorRateLeadingSuccessesThenOneFailure(t *testing.T) {
+	tr := NewLatencyTracker()
+	for i := 0; i < 5; i++ {
+		tr.Record(10*time.Millisecond, false)
+	}
+	tr.Record(10*time.Millisecond, true)
+
+	if got := tr.ErrorRate(); got >= 0.5 {
+		t.Fatalf("expected error rate below 0.5 after one failure following successes, got %v", got)
+	}
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func TestLatencyTracker_ErrorRateReflectsFailures(t *testing.T) {
tr := NewLatencyTracker()
for i := 0; i < 5; i++ {
tr.Record(10*time.Millisecond, i%2 == 0) // 3 errors of 5
}
rate := tr.ErrorRate()
if rate < 0.3 || rate > 0.9 {
t.Fatalf("expected error rate near 0.4-0.6, got %v", rate)
}
}
func TestLatencyTracker_ErrorRateReflectsFailures(t *testing.T) {
tr := NewLatencyTracker()
for i := 0; i < 5; i++ {
tr.Record(10*time.Millisecond, i%2 == 0) // 3 errors of 5
}
rate := tr.ErrorRate()
if rate < 0.3 || rate > 0.9 {
t.Fatalf("expected error rate near 0.4-0.6, got %v", rate)
}
}
func TestLatencyTracker_ErrorRateLeadingSuccessesThenOneFailure(t *testing.T) {
tr := NewLatencyTracker()
for i := 0; i < 5; i++ {
tr.Record(10*time.Millisecond, false)
}
tr.Record(10*time.Millisecond, true)
if got := tr.ErrorRate(); got >= 0.5 {
t.Fatalf("expected error rate below 0.5 after one failure following successes, got %v", got)
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/llmclient/latency_test.go` around lines 28 - 37, Add a new
regression test function (e.g.,
TestLatencyTracker_LeadingSuccessesDoNotSkewErrorRate) after the existing
TestLatencyTracker_ErrorRateReflectsFailures test that verifies ErrorRate()
behavior when multiple successful samples are recorded before a single failure
occurs. The test should create a NewLatencyTracker, record several successful
samples by calling Record with appropriate parameters indicating success (false
or 0), then record one failure, and finally assert that ErrorRate() returns a
value that does not spike to 1.0 but remains below the routing cutoff threshold,
demonstrating that a single failure after many successes does not incorrectly
skew the error rate calculation.

Comment thread internal/llmclient/latency.go
Comment thread internal/providers/registry.go
Comment thread internal/router/registry_test.go
Comment thread internal/router/registry.go Outdated
Comment thread internal/router/strategy_balanced.go
Comment thread internal/router/strategy_cost.go Outdated
Comment thread internal/router/strategy_test.go
@Barsit

Barsit commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

Review findings assessment

Fixed

# Finding Source Fix
1 BalancedStrategy zero-weight defaulting bug (costW check used modified variable) coderabbitai Changed to check s.CostWeight instead of costW
2 CostOnlyStrategy degrades to latency when free model has pricing coderabbitai, greptile Added hasPricingData() helper, replaces max==0 sentinel
3 LatencyOnlyStrategy same issue coderabbitai Added hasLatencyData() helper
4 EWMA first-sample detection using value==0 treats genuine zero as uninitialized coderabbitai Added initialized bool field to ewma struct
5 ListProvidersForModel returns nil instead of empty slice coderabbitai Changed to return []ModelWithProvider{}
6 WithStrategyOverride stores empty strings coderabbitai Added empty-string guard
7 NaN/Inf accepted as valid config values coderabbitai Added math.IsNaN/IsInf checks in router config + ParseWeights
8 ParseWeights accepts trailing garbage (0.6x,0.4) coderabbitai Added full-string consumption check
9 Model-syntax provider overridden when req.Provider is empty coderabbitai, greptile Added slash-in-model check in resolveProvider eligibility

Assessed as not actionable

Finding Reason
hasProviderInModel misidentifies slash models In GoModel, ANY slash is provider/ID syntax. Slash IDs go through proper fallback. Consistent with existing routing semantics.
Docstring coverage < 80% Standard bot threshold; 45% is reasonable for established codebase.
Patch coverage 69.6% Core strategy/scoring logic has 94% coverage. Lower coverage in wiring/init code that is hard to unit-test in isolation.
Audio/realtime routing These paths do not set the eligibility flag, so strategy does not apply to them by design. Even if it did, callers do capability assertions.
Zero max_error_rate accepted Zero is a documented sentinel for "use default" (0.5), consistent with the rest of the gateway's config patterns.

@Barsit Barsit force-pushed the feat/intelligent-routing branch from d33b07f to dcfce08 Compare June 20, 2026 06:00

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
internal/llmclient/latency_test.go (1)

28-37: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add a regression test for “many successes, then one failure” error-rate behavior.

This specific path is still untested and is the one that previously regressed.

Suggested test case
 func TestLatencyTracker_ErrorRateReflectsFailures(t *testing.T) {
 	tr := NewLatencyTracker()
 	for i := 0; i < 5; i++ {
 		tr.Record(10*time.Millisecond, i%2 == 0) // 3 errors of 5
 	}
 	rate := tr.ErrorRate()
 	if rate < 0.3 || rate > 0.9 {
 		t.Fatalf("expected error rate near 0.4-0.6, got %v", rate)
 	}
 }
+
+func TestLatencyTracker_ErrorRateLeadingSuccessesThenOneFailure(t *testing.T) {
+	tr := NewLatencyTracker()
+	for i := 0; i < 5; i++ {
+		tr.Record(10*time.Millisecond, false)
+	}
+	tr.Record(10*time.Millisecond, true)
+	if got := tr.ErrorRate(); got >= 0.5 {
+		t.Fatalf("expected error rate below 0.5 after one late failure, got %v", got)
+	}
+}
As per coding guidelines, `**/*_test.go` changes should include behavior-focused coverage for updated logic paths.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/llmclient/latency_test.go` around lines 28 - 37, Add a new
regression test function (separate from
TestLatencyTracker_ErrorRateReflectsFailures) to specifically test the "many
successes followed by one failure" error-rate calculation path. This test should
create a fresh LatencyTracker, record multiple successful calls (with the
success parameter as true), then record a single failure, and verify that the
error rate is calculated correctly for this specific scenario. This ensures the
error rate logic handles the case of predominantly successful operations with a
single failure at the end, which is the code path that previously regressed.

Source: Coding guidelines

internal/router/context.go (1)

35-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Empty strategy override does not actually clear inherited context override.

WithStrategyOverride(ctx, "") returns ctx unchanged, so any parent strategy override remains active despite the “empty clears override” contract. This can keep an unintended override in effect during StrategyRegistry.Resolve.

Suggested minimal fix
-import "context"
+import (
+	"context"
+	"strings"
+)
@@
 func StrategyOverrideFromContext(ctx context.Context) (string, bool) {
 	s, ok := ctx.Value(strategyCtxKey{}).(string)
-	return s, ok
+	if !ok {
+		return "", false
+	}
+	s = strings.TrimSpace(s)
+	if s == "" {
+		return "", false
+	}
+	return s, true
 }
@@
 func WithStrategyOverride(ctx context.Context, strategy string) context.Context {
-	if strategy == "" {
-		return ctx
+	strategy = strings.TrimSpace(strategy)
+	if strategy == "" {
+		// Shadow any inherited override as absent.
+		return context.WithValue(ctx, strategyCtxKey{}, nil)
 	}
 	return context.WithValue(ctx, strategyCtxKey{}, strategy)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/router/context.go` around lines 35 - 49, The WithStrategyOverride
function returns the context unchanged when strategy is an empty string, which
fails to clear inherited overrides from parent contexts. To fix this, remove the
early return condition that checks if strategy is empty, and instead always
execute the context.WithValue call regardless of whether strategy is empty or
not. This ensures that an empty strategy value is properly set in the current
context layer, effectively shadowing and overriding any parent context value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CLAUDE.md`:
- Line 134: The "Intelligent routing" documentation section on line 134 is a
single overly dense bullet containing multiple distinct concepts (routing
strategy defaults, configuration options, EWMA tracking details, header
overrides, and explicit provider behavior). Break this into 4-6 focused
sub-bullets, each addressing one specific topic: one for the routing strategy
configuration and defaults, one for the scoring weights, one for error rate
filtering, one for EWMA latency tracking mechanics, one for request header
overrides, and one for explicit provider-qualified request bypass behavior. This
will improve scannability and make it easier for operators to find and
understand each configuration option independently.

In `@config/router.go`:
- Around line 61-65: The empty-check for cfg.Strategy on line 61 occurs before
trimming and converting to lowercase on line 64, which means a whitespace-only
value like "   " will pass the empty check and later fail validation instead of
defaulting to RouterStrategyBalanced. Move the strings.TrimSpace and
strings.ToLower operations before the empty-check for cfg.Strategy so that
whitespace-only values are properly caught and default to the balanced strategy
as intended.

In `@internal/llmclient/client.go`:
- Around line 231-234: The completeScope function has an inconsistency where the
latencyTracker.Record call uses err != nil to determine failure, but the circuit
breaker completion tracking uses cbErr. This causes routing error-rate telemetry
to miss failures when cbErr is set but err is nil. Update the
latencyTracker.Record call to check for failures consistently by accounting for
both err and cbErr, ensuring that any error tracked by the circuit breaker is
also reflected in the latency telemetry.

In `@internal/router/registry_test.go`:
- Around line 58-75: The TestParseWeights_Valid function only validates a single
canonical numeric format. Add additional test cases to this function to verify
that ParseWeights correctly handles equivalent numeric representations like
"0.70,0.30" and "7e-1,3e-1" which should all parse successfully and produce the
same Weight values (0.7 and 0.3 for Cost and Latency respectively). Update the
test to use a table-driven approach or add separate test cases for each format
variation to ensure regression-proof parser behavior.

In `@internal/router/registry.go`:
- Around line 101-113: The validation logic in the weight parsing code is overly
restrictive by comparing the original input strings (costStr and latStr) against
their formatted parsed values using fmt.Sprintf. This check rejects valid
numeric formats like 0.60 (which becomes 0.6) or scientific notation like 6e-1.
Since fmt.Sscanf has already successfully validated and parsed the input into a
float value, remove the string comparison checks that verify costStr matches
fmt.Sprintf("%v", cost) and latStr matches fmt.Sprintf("%v", lat). Keep only the
fmt.Sscanf validation and error handling, as it is sufficient to ensure the
input is a valid numeric weight.

In `@internal/server/routing_strategy.go`:
- Around line 33-35: The ParseWeights function in internal/router/registry.go is
performing canonical string matching after float parsing, which causes valid
decimal weight headers like "1.0,0.0" to be rejected. Modify the ParseWeights
function to be more tolerant of different valid decimal formats by normalizing
them during parsing instead of performing strict canonical matching. This aligns
with the principle to accept requests generously and adapt them according to
Postel's Law, allowing decimal values in various formats (like "1.0,0.0", "1,0",
etc.) to be accepted and properly parsed without silently ignoring valid input.

---

Duplicate comments:
In `@internal/llmclient/latency_test.go`:
- Around line 28-37: Add a new regression test function (separate from
TestLatencyTracker_ErrorRateReflectsFailures) to specifically test the "many
successes followed by one failure" error-rate calculation path. This test should
create a fresh LatencyTracker, record multiple successful calls (with the
success parameter as true), then record a single failure, and verify that the
error rate is calculated correctly for this specific scenario. This ensures the
error rate logic handles the case of predominantly successful operations with a
single failure at the end, which is the code path that previously regressed.

In `@internal/router/context.go`:
- Around line 35-49: The WithStrategyOverride function returns the context
unchanged when strategy is an empty string, which fails to clear inherited
overrides from parent contexts. To fix this, remove the early return condition
that checks if strategy is empty, and instead always execute the
context.WithValue call regardless of whether strategy is empty or not. This
ensures that an empty strategy value is properly set in the current context
layer, effectively shadowing and overriding any parent context value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 619e37f2-39dc-4d85-a0ff-6c5c614cbd21

📥 Commits

Reviewing files that changed from the base of the PR and between d33b07f and dcfce08.

📒 Files selected for processing (27)
  • .env.template
  • CLAUDE.md
  • config/config.example.yaml
  • config/config.go
  • config/router.go
  • internal/core/interfaces.go
  • internal/gateway/inference_prepare.go
  • internal/llmclient/client.go
  • internal/llmclient/latency.go
  • internal/llmclient/latency_test.go
  • internal/providers/init.go
  • internal/providers/registry.go
  • internal/providers/router.go
  • internal/providers/router_strategy_integration_test.go
  • internal/providers/strategy_init.go
  • internal/router/context.go
  • internal/router/registry.go
  • internal/router/registry_test.go
  • internal/router/scoring.go
  • internal/router/strategy.go
  • internal/router/strategy_balanced.go
  • internal/router/strategy_cost.go
  • internal/router/strategy_firstfit.go
  • internal/router/strategy_latency.go
  • internal/router/strategy_test.go
  • internal/server/http.go
  • internal/server/routing_strategy.go

Comment thread CLAUDE.md
- **Guardrails:** Configured via `config/config.yaml` only (except `GUARDRAILS_ENABLED` env var)
- **Providers:** `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, `USE_GOOGLE_GEMINI_NATIVE_API` (true by default; false uses Gemini's OpenAI-compatible chat API), `XAI_API_KEY`, `GROQ_API_KEY`, `OPENROUTER_API_KEY`, `ZAI_API_KEY`, `ZAI_BASE_URL` (optional Z.ai endpoint override), `MINIMAX_API_KEY`, `MINIMAX_BASE_URL` (optional MiniMax endpoint override), `XIAOMI_API_KEY`, `XIAOMI_BASE_URL` (optional Xiaomi MiMo endpoint override), `OPENCODE_GO_API_KEY`, `OPENCODE_GO_BASE_URL` (optional OpenCode Go/Zen endpoint override; default `https://opencode.ai/zen/go/v1`), `OPENCODE_GO_MESSAGES_MODELS` (optional comma-separated model IDs routed to the Anthropic-native `/messages` endpoint instead of `/chat/completions`; default `qwen3.7-max`), `BAILIAN_API_KEY`, `BAILIAN_BASE_URL` (optional Bailian base URL for region switching; default `https://dashscope.aliyuncs.com/compatible-mode/v1`), `AZURE_API_KEY`, `AZURE_BASE_URL` (Azure OpenAI deployment base URL), `AZURE_API_VERSION` (optional Azure API version), `ORACLE_API_KEY` (Oracle API key), `ORACLE_BASE_URL` (Oracle OpenAI-compatible base URL), `<PROVIDER>[_SUFFIX]_MODELS` (comma-separated configured model list for any provider type), `OLLAMA_BASE_URL`, `VLLM_BASE_URL`, `VLLM_API_KEY` (optional upstream vLLM bearer token)
- **Provider model metadata:** `providers.<name>.models` accepts either model IDs (strings) or `{id, metadata}` objects. When `metadata` is supplied (`display_name`, `context_window`, `max_output_tokens`, `modes`, `capabilities`, `pricing`, …) it is merged onto the remote ai-model-list entry during enrichment, with operator values winning per-field. Primary use case: advertising context windows, capabilities, and pricing for local models (Ollama) and other custom endpoints whose IDs are not in the upstream registry.
- **Intelligent routing:** When a request does not pin a provider and multiple configured providers serve the same model ID, the gateway scores candidates and routes to the best one (single-provider configs are unaffected — one candidate means no decision). `MODEL_ROUTING_STRATEGY` (`balanced` default; also `cost_only`, `latency_only`, `first_fit`), `MODEL_ROUTING_STRATEGY_WEIGHTS` (`0.6,0.4` = cost_weight,latency_weight; balanced only), `MODEL_ROUTING_MAX_ERROR_RATE` (`0.5`; candidates at/above this smoothed error rate are filtered before scoring). Latency is tracked per provider as in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate), never persisted. Per-request overrides via request headers `X-GoModel-Routing-Strategy` and `X-GoModel-Routing-Weights` (`cost,latency`); invalid header values are silently dropped (WARN log) and fall back to the global strategy. Explicit provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Split this into short sub-bullets for scanability.

Line 134 packs defaults, runtime behavior, EWMA internals, and header override semantics into one dense line. Breaking it into 4–6 bullets will make operator usage much easier to parse.

Suggested structure
-- **Intelligent routing:** When a request does not pin a provider and multiple configured providers serve the same model ID, ...
+- **Intelligent routing:** enabled when request has no pinned provider and multiple providers serve the same model.
+  - **Defaults:** `MODEL_ROUTING_STRATEGY=balanced`, `MODEL_ROUTING_STRATEGY_WEIGHTS=0.6,0.4`, `MODEL_ROUTING_MAX_ERROR_RATE=0.5`
+  - **Filtering:** candidates at/above `max_error_rate` are excluded before scoring.
+  - **Latency stats:** in-memory EWMA only (not persisted).
+  - **Per-request overrides:** `X-GoModel-Routing-Strategy`, `X-GoModel-Routing-Weights`; invalid values are ignored with WARN logs.
+  - **Bypass rule:** provider-qualified requests like `openai-east/gpt-4o` bypass intelligent routing.

As per coding guidelines, “Documentation should be concise, practical, and user-focused. Show defaults, explain when to change them, and include minimal examples when useful.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **Intelligent routing:** When a request does not pin a provider and multiple configured providers serve the same model ID, the gateway scores candidates and routes to the best one (single-provider configs are unaffected — one candidate means no decision). `MODEL_ROUTING_STRATEGY` (`balanced` default; also `cost_only`, `latency_only`, `first_fit`), `MODEL_ROUTING_STRATEGY_WEIGHTS` (`0.6,0.4` = cost_weight,latency_weight; balanced only), `MODEL_ROUTING_MAX_ERROR_RATE` (`0.5`; candidates at/above this smoothed error rate are filtered before scoring). Latency is tracked per provider as in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate), never persisted. Per-request overrides via request headers `X-GoModel-Routing-Strategy` and `X-GoModel-Routing-Weights` (`cost,latency`); invalid header values are silently dropped (WARN log) and fall back to the global strategy. Explicit provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing.
- **Intelligent routing:** enabled when request has no pinned provider and multiple providers serve the same model.
- **Defaults:** `MODEL_ROUTING_STRATEGY=balanced`, `MODEL_ROUTING_STRATEGY_WEIGHTS=0.6,0.4`, `MODEL_ROUTING_MAX_ERROR_RATE=0.5`
- **Filtering:** candidates at/above `max_error_rate` are excluded before scoring.
- **Latency stats:** in-memory EWMA only (not persisted).
- **Per-request overrides:** `X-GoModel-Routing-Strategy`, `X-GoModel-Routing-Weights`; invalid values are ignored with WARN logs.
- **Bypass rule:** provider-qualified requests like `openai-east/gpt-4o` bypass intelligent routing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CLAUDE.md` at line 134, The "Intelligent routing" documentation section on
line 134 is a single overly dense bullet containing multiple distinct concepts
(routing strategy defaults, configuration options, EWMA tracking details, header
overrides, and explicit provider behavior). Break this into 4-6 focused
sub-bullets, each addressing one specific topic: one for the routing strategy
configuration and defaults, one for the scoring weights, one for error rate
filtering, one for EWMA latency tracking mechanics, one for request header
overrides, and one for explicit provider-qualified request bypass behavior. This
will improve scannability and make it easier for operators to find and
understand each configuration option independently.

Source: Coding guidelines

Comment thread config/router.go
Comment on lines 231 to 234
func (c *Client) completeScope(scope requestScope, statusCode int, err, cbErr error) {
c.recordCircuitBreakerCompletion(statusCode, cbErr)
c.latencyTracker.Record(time.Since(scope.startedAt), err != nil)
c.finishRequest(scope, statusCode, err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Error-rate telemetry can miss failures tracked by circuit-breaker path.

completeScope records routing error EWMA with err != nil, but CB accounting uses cbErr. If cbErr is set while err is nil, routing error-rate remains artificially low.

Suggested minimal alignment
 func (c *Client) completeScope(scope requestScope, statusCode int, err, cbErr error) {
 	c.recordCircuitBreakerCompletion(statusCode, cbErr)
-	c.latencyTracker.Record(time.Since(scope.startedAt), err != nil)
+	c.latencyTracker.Record(time.Since(scope.startedAt), err != nil || cbErr != nil)
 	c.finishRequest(scope, statusCode, err)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/llmclient/client.go` around lines 231 - 234, The completeScope
function has an inconsistency where the latencyTracker.Record call uses err !=
nil to determine failure, but the circuit breaker completion tracking uses
cbErr. This causes routing error-rate telemetry to miss failures when cbErr is
set but err is nil. Update the latencyTracker.Record call to check for failures
consistently by accounting for both err and cbErr, ensuring that any error
tracked by the circuit breaker is also reflected in the latency telemetry.

Comment on lines +58 to +75
func TestParseWeights_Valid(t *testing.T) {
w, err := ParseWeights("0.7,0.3")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if w.Cost != 0.7 || w.Latency != 0.3 {
t.Fatalf("expected 0.7,0.3 got %v,%v", w.Cost, w.Latency)
}
}

func TestParseWeights_Invalid(t *testing.T) {
cases := []string{"0.6", "0.6,0.4,0.2", "abc,0.4", "-1,0.4", "0.6x,0.4", "NaN,0.4", "Inf,0.4"}
for _, c := range cases {
if _, err := ParseWeights(c); err == nil {
t.Errorf("expected error for %q", c)
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add positive parsing cases for equivalent numeric formats.

TestParseWeights_Valid currently validates only one canonical representation. Please add accepted-format cases like 0.70,0.30 and 7e-1,3e-1 so parser behavior is regression-proof after the ParseWeights fix.

💡 Minimal test update
 func TestParseWeights_Valid(t *testing.T) {
-	w, err := ParseWeights("0.7,0.3")
-	if err != nil {
-		t.Fatalf("unexpected error: %v", err)
-	}
-	if w.Cost != 0.7 || w.Latency != 0.3 {
-		t.Fatalf("expected 0.7,0.3 got %v,%v", w.Cost, w.Latency)
+	cases := []string{
+		"0.7,0.3",
+		"0.70,0.30",
+		"7e-1,3e-1",
+	}
+	for _, tc := range cases {
+		w, err := ParseWeights(tc)
+		if err != nil {
+			t.Fatalf("unexpected error for %q: %v", tc, err)
+		}
+		if w.Cost != 0.7 || w.Latency != 0.3 {
+			t.Fatalf("expected 0.7,0.3 for %q got %v,%v", tc, w.Cost, w.Latency)
+		}
 	}
 }

As per coding guidelines, tests should be updated for behavior changes and parser error-handling boundaries.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func TestParseWeights_Valid(t *testing.T) {
w, err := ParseWeights("0.7,0.3")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if w.Cost != 0.7 || w.Latency != 0.3 {
t.Fatalf("expected 0.7,0.3 got %v,%v", w.Cost, w.Latency)
}
}
func TestParseWeights_Invalid(t *testing.T) {
cases := []string{"0.6", "0.6,0.4,0.2", "abc,0.4", "-1,0.4", "0.6x,0.4", "NaN,0.4", "Inf,0.4"}
for _, c := range cases {
if _, err := ParseWeights(c); err == nil {
t.Errorf("expected error for %q", c)
}
}
}
func TestParseWeights_Valid(t *testing.T) {
cases := []string{
"0.7,0.3",
"0.70,0.30",
"7e-1,3e-1",
}
for _, tc := range cases {
w, err := ParseWeights(tc)
if err != nil {
t.Fatalf("unexpected error for %q: %v", tc, err)
}
if w.Cost != 0.7 || w.Latency != 0.3 {
t.Fatalf("expected 0.7,0.3 for %q got %v,%v", tc, w.Cost, w.Latency)
}
}
}
func TestParseWeights_Invalid(t *testing.T) {
cases := []string{"0.6", "0.6,0.4,0.2", "abc,0.4", "-1,0.4", "0.6x,0.4", "NaN,0.4", "Inf,0.4"}
for _, c := range cases {
if _, err := ParseWeights(c); err == nil {
t.Errorf("expected error for %q", c)
}
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/router/registry_test.go` around lines 58 - 75, The
TestParseWeights_Valid function only validates a single canonical numeric
format. Add additional test cases to this function to verify that ParseWeights
correctly handles equivalent numeric representations like "0.70,0.30" and
"7e-1,3e-1" which should all parse successfully and produce the same Weight
values (0.7 and 0.3 for Cost and Latency respectively). Update the test to use a
table-driven approach or add separate test cases for each format variation to
ensure regression-proof parser behavior.

Source: Coding guidelines

Comment thread internal/router/registry.go Outdated
Comment on lines +33 to +35
if weights := strings.TrimSpace(c.Request().Header.Get(RoutingWeightsHeader)); weights != "" {
if override, err := router.ParseWeights(weights); err != nil {
slog.WarnContext(ctx, "invalid routing weights header, ignoring",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Valid decimal weight headers are being rejected in the override path.

Line 34 depends on router.ParseWeights, which currently performs canonical string matching after float parsing; values like 1.0,0.0 get treated as invalid and silently ignored. That breaks common operator input and conflicts with the “accept requests generously” contract.

Suggested fix (in internal/router/registry.go)
-import "fmt"
+import (
+  "fmt"
+  "strconv"
+)

- n, err := fmt.Sscanf(costStr, "%f", &cost)
- if err != nil || n != 1 {
+ cost, err := strconv.ParseFloat(costStr, 64)
+ if err != nil {
    return WeightsOverride{}, fmt.Errorf("invalid cost weight %q: parse error", costStr)
  }
- if costStr != fmt.Sprintf("%v", cost) {
-   return WeightsOverride{}, fmt.Errorf("invalid cost weight %q: trailing characters", parts[0])
- }

- n, err = fmt.Sscanf(latStr, "%f", &lat)
- if err != nil || n != 1 {
+ lat, err := strconv.ParseFloat(latStr, 64)
+ if err != nil {
    return WeightsOverride{}, fmt.Errorf("invalid latency weight %q: parse error", latStr)
  }
- if latStr != fmt.Sprintf("%v", lat) {
-   return WeightsOverride{}, fmt.Errorf("invalid latency weight %q: trailing characters", parts[1])
- }

As per coding guidelines, **/*.go should “Accept requests generously … and adapt them”; based on learnings, this should follow Postel’s Law for tolerant request parsing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/server/routing_strategy.go` around lines 33 - 35, The ParseWeights
function in internal/router/registry.go is performing canonical string matching
after float parsing, which causes valid decimal weight headers like "1.0,0.0" to
be rejected. Modify the ParseWeights function to be more tolerant of different
valid decimal formats by normalizing them during parsing instead of performing
strict canonical matching. This aligns with the principle to accept requests
generously and adapt them according to Postel's Law, allowing decimal values in
various formats (like "1.0,0.0", "1,0", etc.) to be accepted and properly parsed
without silently ignoring valid input.

Sources: Coding guidelines, Learnings

@Barsit Barsit force-pushed the feat/intelligent-routing branch from dcfce08 to d07a10e Compare June 20, 2026 06:41
This feature enables the gateway to proactively select the best provider
when multiple providers serve the same model, combining cost and latency
signals into a weighted score. Previously the gateway used a deterministic
first-wins strategy: the first registered provider always handled the
request, regardless of cost or performance.

=== Architecture ===

- Pluggable RoutingStrategy interface (balanced, cost_only, latency_only,
  first_fit) with extensible registry
- BalancedStrategy (default): weighted scoring (cost 0.6, latency 0.4),
  lower score wins; missing data scores 1.0 (worst) to prefer
  complete-data providers; ties broken by provider name lexicographic order
- CostOnlyStrategy / LatencyOnlyStrategy: single-factor ranking, degrade
  to the other dimension when every candidate lacks their primary signal
  (without the mutual-recursion bug that was identified and fixed)
- FirstFitStrategy: first acceptable candidate in registration order,
  matching historical first-wins behaviour

=== Latency data collection ===

- EWMA-based LatencyTracker in llmclient.Client (in-memory, ~32 bytes per
  provider, alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate, never persisted)
- ProviderStats optional interface exposed by llmclient.Client so the
  router can read P50Latency, P99Latency, ErrorRate, and CircuitState
  via a single type assertion
- completeScope() is the single Record() call site

=== Candidate selection ===

- collectCandidates(modelID) enumerates providers serving modelID via the
  new O(k) ListProvidersForModel registry method (registration-order aware)
- Candidates with circuit open or error rate at/above 0.5 are filtered
  before scoring. When every candidate is filtered the caller falls back
  to the first candidate with a WARN log
- Explicit provider-qualified requests bypass intelligent routing;
  single-provider configs are unaffected (one candidate, no decision)

=== Per-request overrides ===

- X-GoModel-Routing-Strategy and X-GoModel-Routing-Weights headers;
  invalid values silently fall back (WARN log)
- RoutingEligible context flag set during prepare for unqualified requests

=== Config ===

- MODEL_ROUTING_STRATEGY, MODEL_ROUTING_STRATEGY_WEIGHTS,
  MODEL_ROUTING_MAX_ERROR_RATE env vars; router block in config.yaml

=== Bugs fixed during development ===

1. Mutual recursion: cost/latency strategies recursed when both
   dimensions absent. Fixed by computing fallback scores inline
2. Free model mis-classification: normalizeCost treated cost=0 as
   unknown. Fixed with -1 sentinel
3. Production path never executed strategy: PrepareChatRequest set
   req.Provider, resolveProvider skipped due to providerHint != "".
   Fixed via RoutingEligible context flag
4. FirstFit used alpha-sorted order instead of registration order.
   Fixed via ListProvidersForModel with registration-order tracking
5. BalancedStrategy zero-weight default used modified variable instead
   of original, causing LatencyWeight to stay 0 when both weights were 0
6. CostOnly degraded when free model present (max==0 sentinel).
   Fixed with explicit hasPricingData() helper
7. ParseWeights accepted NaN/Inf/trailing-garbage. Fixed with math
   validation and full-string consumption check
8. Model-syntax provider ("pricey/gpt-4o") overridden when req.Provider
   was empty. Fixed with slash-in-model check in resolveProvider

=== Testing ===

- 38+ unit tests covering every strategy, edge cases, registry,
  context helpers, and EWMA convergence
- 9 integration tests covering cheaper/faster selection, header
  overrides, invalid override fallback, explicit provider bypass,
  single-candidate no-op, weights override, FirstFit registration
  order, and model-syntax provider pinning
- lint: 0 issues, full project build passes
@Barsit Barsit force-pushed the feat/intelligent-routing branch from d07a10e to 8ea9477 Compare June 20, 2026 06:41
@Barsit

Barsit commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

Additional fix pushed:

  • ParseWeights trailing-zeros regression: The Sprintf full-string consumption check I added earlier incorrectly rejected valid inputs like "0.60,0.40" (because Sprintf("%v", 0.6) produces "0.6", not "0.60"). Fixed by using Sscanf("%f%s") with a trailing string variable — n=1 means exactly the float was consumed (no trailing chars), while n=2 means there's trailing garbage. Also expanded TestParseWeights_Valid to cover trailing zeros, whitespace, whole numbers, and zero-cost cases.

All previous findings (balanced zero-weight bug, free pricing degradation, NaN/Inf validation, EWMA init flag, model-syntax provider override, etc.) remain fixed in the squashed commit.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
coverage-bailian (1)

1-36: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove generated test coverage artifact from version control.

The coverage-bailian file is a Go test coverage profile generated at test time via go test -coverprofile=... (per Makefile line 51). Generated artifacts should never be committed to version control because:

  1. Causes constant diffs: Each test run produces different coverage data (different lines executed, percentages change).
  2. Obscures real changes: Coverage artifacts clutter PR diffs and hide actual code changes from review.
  3. Violates twelve-factor principles: Generated build outputs belong in CI/CD artifact storage, not git history.
  4. Risk of merge conflicts: Multiple developers/CI runs will cause conflicting coverage changes.

Action: Remove this file from the repository and add it to .gitignore alongside the standard coverage.out and other generated test artifacts. If coverage tracking is needed, store baselines in CI/CD systems or external coverage dashboards (Codecov, Coveralls, etc.), not git.

[major_issue]

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@coverage-bailian` around lines 1 - 36, Remove the coverage-bailian file from
version control as it is a generated test coverage artifact that should not be
committed. Additionally, add coverage-bailian to the .gitignore file to prevent
it from being tracked in future commits, grouping it with other generated
coverage files like coverage.out and similar test artifacts. Use git to remove
the file from tracking while keeping it locally if needed for local testing
purposes.

Source: Learnings

♻️ Duplicate comments (1)
CLAUDE.md (1)

134-134: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Split this dense line into focused sub-bullets for scannability.

Line 134 packs routing strategy defaults, weight configuration, error-rate filtering, EWMA tracking mechanics, header overrides, and explicit provider bypass logic into one wrapped sentence. Breaking it into 4–6 sub-bullets will make operator usage much easier to understand and reference independently.

As per the coding guidelines, documentation "should be concise, practical, and user-focused." Scannable structure directly supports that goal.

Suggested structure
- **Intelligent routing:** When a request does not pin a provider and multiple configured providers serve the same model ID, the gateway scores candidates and routes to the best one (single-provider configs are unaffected — one candidate means no decision). `MODEL_ROUTING_STRATEGY` (`balanced` default; also `cost_only`, `latency_only`, `first_fit`), `MODEL_ROUTING_STRATEGY_WEIGHTS` (`0.6,0.4` = cost_weight,latency_weight; balanced only), `MODEL_ROUTING_MAX_ERROR_RATE` (`0.5`; candidates at/above this smoothed error rate are filtered before scoring). Latency is tracked per provider as in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate), never persisted. Per-request overrides via request headers `X-GoModel-Routing-Strategy` and `X-GoModel-Routing-Weights` (`cost,latency`); invalid header values are silently dropped (WARN log) and fall back to the global strategy. Explicit provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing.
+- **Intelligent routing:** enabled when request has no pinned provider and multiple providers serve the same model (single-provider configs unaffected).
+  - **Configuration:** `MODEL_ROUTING_STRATEGY` (`balanced` default; also `cost_only`, `latency_only`, `first_fit`)
+  - **Balanced weights:** `MODEL_ROUTING_STRATEGY_WEIGHTS` as `cost_weight,latency_weight` (default `0.6,0.4`; balanced strategy only)
+  - **Filtering:** `MODEL_ROUTING_MAX_ERROR_RATE` (default `0.5`); candidates at/above this smoothed error rate are excluded before scoring
+  - **Latency stats:** in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate); not persisted
+  - **Per-request overrides:** `X-GoModel-Routing-Strategy` and `X-GoModel-Routing-Weights` headers; invalid values are WARN-logged and fall back to global strategy
+  - **Bypass rule:** provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CLAUDE.md` at line 134, The documentation for intelligent routing in
CLAUDE.md contains a single very long sentence that covers multiple distinct
configuration options and behaviors all at once, making it difficult for users
to scan and reference individual features. Break this single dense bullet point
into 4-6 separate focused sub-bullets, each covering one specific aspect:
routing strategy options and defaults (MODEL_ROUTING_STRATEGY), weight
configuration for balanced mode (MODEL_ROUTING_STRATEGY_WEIGHTS), error-rate
filtering behavior (MODEL_ROUTING_MAX_ERROR_RATE), EWMA latency tracking
mechanics, per-request header overrides and their fallback behavior, and
explicit provider-qualified request bypass logic. Ensure each sub-bullet is
concise and independently understandable while maintaining the technical
accuracy of the original content.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/gateway/inference_prepare.go`:
- Around line 240-242: The comment preceding the return statement in this helper
function contradicts the actual behavior of the logic. The return condition
`firstSlash > 0 && firstSlash < len(m)-1` treats slash-style model IDs like
"meta-llama/Llama-3-70b" as provider-qualified (returns true), but the comment
claims these are not treated as provider-qualified. Update the comment to
accurately reflect that the function returns true when there is a valid provider
prefix followed by a model ID (i.e., when a slash exists that is neither at the
start nor at the end of the string).

In `@internal/llmclient/latency.go`:
- Around line 86-103: The Record method in LatencyTracker truncates
sub-millisecond latencies to zero when converting to milliseconds using
duration.Milliseconds(), which returns an integer. To preserve sub-millisecond
precision and avoid losing valid fast latency measurements, modify the
conversion to use floating point arithmetic without truncation. Convert the
duration to milliseconds as a float64 by either using duration.Seconds() and
multiplying by 1000, or by dividing duration.Nanoseconds() by 1e6, then pass
this precise value to the Add methods for t.p50 and t.p99.

In `@internal/providers/router_strategy_integration_test.go`:
- Line 75: The NewRouter constructor error is being discarded using the blank
identifier in the router initialization across multiple test locations. Replace
the blank identifier pattern with proper error handling by assigning the error
return value to a variable, then check if the error is not nil and immediately
fail the test using t.Fatalf with a descriptive message. Apply this same
error-checking pattern consistently at each NewRouter call site throughout the
integration test file to ensure setup failures are caught immediately rather
than causing misleading downstream assertion failures.

In `@internal/router/scoring.go`:
- Around line 165-176: The perMtokCost function currently returns an implicit 0
when Pricing exists but both InputPerMtok and OutputPerMtok are nil, which
incorrectly treats non-per-token pricing models as free. Add a check after
attempting to accumulate the token costs to return -1 if neither InputPerMtok
nor OutputPerMtok is present, ensuring unknown per-token pricing is treated as
unavailable rather than zero cost. Apply the same fix to the related code at
lines 201-213 that has similar logic.

In `@internal/router/strategy_test.go`:
- Around line 103-149: Add new test functions to cover the edge case where
candidates have pricing metadata but lack InputPerMtok and OutputPerMtok values.
Create test cases for both
TestCostOnly_DegradesToLatencyWhenPricingHasNoPerTokenCost and
TestLatencyOnly_DegradesToCostWhenPricingHasNoPerTokenCost that use the pricing
helper function but verify the strategies properly degrade to their fallback
metrics (latency for cost-only strategy, cost for latency-only strategy) when
per-token pricing values are absent. This ensures the selection logic correctly
treats incomplete pricing data as missing pricing for scoring purposes.

In `@internal/server/routing_strategy.go`:
- Around line 29-31: After trimming whitespace from the RoutingStrategyHeader
using strings.TrimSpace, also normalize the strategy value to lowercase before
passing it to router.WithStrategyOverride(). This ensures that case variants
like Latency_Only are properly handled and not silently dropped later due to
case mismatches, making the request parsing more permissive and following
Postel's Law for tolerant header handling.

---

Outside diff comments:
In `@coverage-bailian`:
- Around line 1-36: Remove the coverage-bailian file from version control as it
is a generated test coverage artifact that should not be committed.
Additionally, add coverage-bailian to the .gitignore file to prevent it from
being tracked in future commits, grouping it with other generated coverage files
like coverage.out and similar test artifacts. Use git to remove the file from
tracking while keeping it locally if needed for local testing purposes.

---

Duplicate comments:
In `@CLAUDE.md`:
- Line 134: The documentation for intelligent routing in CLAUDE.md contains a
single very long sentence that covers multiple distinct configuration options
and behaviors all at once, making it difficult for users to scan and reference
individual features. Break this single dense bullet point into 4-6 separate
focused sub-bullets, each covering one specific aspect: routing strategy options
and defaults (MODEL_ROUTING_STRATEGY), weight configuration for balanced mode
(MODEL_ROUTING_STRATEGY_WEIGHTS), error-rate filtering behavior
(MODEL_ROUTING_MAX_ERROR_RATE), EWMA latency tracking mechanics, per-request
header overrides and their fallback behavior, and explicit provider-qualified
request bypass logic. Ensure each sub-bullet is concise and independently
understandable while maintaining the technical accuracy of the original content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 381fa4cf-069d-4358-b9be-aa1adcfa84bb

📥 Commits

Reviewing files that changed from the base of the PR and between dcfce08 and 8ea9477.

⛔ Files ignored due to path filters (3)
  • c.out is excluded by !**/*.out
  • coverage-bailian.out is excluded by !**/*.out
  • gomodel.exe is excluded by !**/*.exe
📒 Files selected for processing (36)
  • .env.template
  • .gitignore
  • .idea/.gitignore
  • .idea/GoModel.iml
  • .idea/go.imports.xml
  • .idea/golinter.xml
  • .idea/modules.xml
  • .idea/vcs.xml
  • CLAUDE.md
  • config/config.example.yaml
  • config/config.go
  • config/router.go
  • coverage-bailian
  • gomodel.exe~
  • internal/core/interfaces.go
  • internal/gateway/inference_prepare.go
  • internal/llmclient/client.go
  • internal/llmclient/latency.go
  • internal/llmclient/latency_test.go
  • internal/providers/init.go
  • internal/providers/registry.go
  • internal/providers/router.go
  • internal/providers/router_strategy_integration_test.go
  • internal/providers/strategy_init.go
  • internal/router/context.go
  • internal/router/registry.go
  • internal/router/registry_test.go
  • internal/router/scoring.go
  • internal/router/strategy.go
  • internal/router/strategy_balanced.go
  • internal/router/strategy_cost.go
  • internal/router/strategy_firstfit.go
  • internal/router/strategy_latency.go
  • internal/router/strategy_test.go
  • internal/server/http.go
  • internal/server/routing_strategy.go

Comment on lines +240 to +242
// A provider prefix is a non-empty segment before a slash that is NOT itself
// a plain model-ID containing a slash (e.g. "meta-llama/Llama-3-70b").
return firstSlash > 0 && firstSlash < len(m)-1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Comment contradicts the helper’s actual behavior.

The comment says slash-style model IDs like meta-llama/Llama-3-70b are not treated as provider-qualified, but the return condition does treat that shape as true. Please align the comment with the implemented semantics (or adjust the logic if that comment reflects the real intent).

✏️ Minimal comment-only fix
-	// A provider prefix is a non-empty segment before a slash that is NOT itself
-	// a plain model-ID containing a slash (e.g. "meta-llama/Llama-3-70b").
+	// Treat any non-edge "x/y" shape as provider-qualified model syntax.
+	// This intentionally includes slash-style model IDs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/gateway/inference_prepare.go` around lines 240 - 242, The comment
preceding the return statement in this helper function contradicts the actual
behavior of the logic. The return condition `firstSlash > 0 && firstSlash <
len(m)-1` treats slash-style model IDs like "meta-llama/Llama-3-70b" as
provider-qualified (returns true), but the comment claims these are not treated
as provider-qualified. Update the comment to accurately reflect that the
function returns true when there is a valid provider prefix followed by a model
ID (i.e., when a slash exists that is neither at the start nor at the end of the
string).

Comment on lines +86 to +103
func (t *LatencyTracker) Record(duration time.Duration, isError bool) {
ms := float64(duration.Milliseconds())
t.p50.Add(ms)
t.p99.Add(ms)
t.errorRate.Add(boolToFloat64(isError))
}

// P50 returns the approximate p50 latency as a time.Duration.
// Returns zero before any sample has been recorded.
func (t *LatencyTracker) P50() time.Duration {
return time.Duration(t.p50.Value()) * time.Millisecond
}

// P99 returns the approximate p99 latency as a time.Duration.
// Returns zero before any sample has been recorded.
func (t *LatencyTracker) P99() time.Duration {
return time.Duration(t.p99.Value()) * time.Millisecond
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Sub-millisecond latency samples are collapsed to “unknown.”

Line 87 truncates to whole milliseconds, so valid <1ms samples become 0; with routing semantics, that can be interpreted as missing latency data instead of observed fast latency.

Suggested fix
 func (t *LatencyTracker) Record(duration time.Duration, isError bool) {
-	ms := float64(duration.Milliseconds())
+	ms := float64(duration) / float64(time.Millisecond)
 	t.p50.Add(ms)
 	t.p99.Add(ms)
 	t.errorRate.Add(boolToFloat64(isError))
 }
@@
 func (t *LatencyTracker) P50() time.Duration {
-	return time.Duration(t.p50.Value()) * time.Millisecond
+	return time.Duration(t.p50.Value() * float64(time.Millisecond))
 }
@@
 func (t *LatencyTracker) P99() time.Duration {
-	return time.Duration(t.p99.Value()) * time.Millisecond
+	return time.Duration(t.p99.Value() * float64(time.Millisecond))
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/llmclient/latency.go` around lines 86 - 103, The Record method in
LatencyTracker truncates sub-millisecond latencies to zero when converting to
milliseconds using duration.Milliseconds(), which returns an integer. To
preserve sub-millisecond precision and avoid losing valid fast latency
measurements, modify the conversion to use floating point arithmetic without
truncation. Convert the duration to milliseconds as a float64 by either using
duration.Seconds() and multiplying by 1000, or by dividing
duration.Nanoseconds() by 1e6, then pass this precise value to the Add methods
for t.p50 and t.p99.

registerModelWithMetadata(t, registry, pricey, "openai-west", "openai", "gpt-4o",
&core.ModelPricing{InputPerMtok: floatPtr(10), OutputPerMtok: floatPtr(20)})

r, _ := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry()))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stop discarding NewRouter errors in integration tests.

These call sites ignore constructor errors, which can hide setup failures and produce misleading downstream assertions. Capture and fail immediately.

Suggested pattern (apply at each call site)
-	r, _ := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry()))
+	r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry()))
+	if err != nil {
+		t.Fatalf("unexpected router construction error: %v", err)
+	}

As per coding guidelines, **/*_test.go should cover error handling for behavior changes.

Also applies to: 101-101, 128-128, 163-163, 189-189, 211-211, 237-237, 275-275, 303-303

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/providers/router_strategy_integration_test.go` at line 75, The
NewRouter constructor error is being discarded using the blank identifier in the
router initialization across multiple test locations. Replace the blank
identifier pattern with proper error handling by assigning the error return
value to a variable, then check if the error is not nil and immediately fail the
test using t.Fatalf with a descriptive message. Apply this same error-checking
pattern consistently at each NewRouter call site throughout the integration test
file to ensure setup failures are caught immediately rather than causing
misleading downstream assertion failures.

Source: Coding guidelines

Comment on lines +165 to +176
func perMtokCost(c *ProviderCandidate) float64 {
if c == nil || c.Pricing == nil {
return -1
}
var cost float64
if c.Pricing.InputPerMtok != nil {
cost += *c.Pricing.InputPerMtok
}
if c.Pricing.OutputPerMtok != nil {
cost += *c.Pricing.OutputPerMtok
}
return cost

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat non-per-token pricing as unknown instead of free.

Line 165 currently returns 0 when Pricing exists but both InputPerMtok and OutputPerMtok are nil. Combined with Line 201, candidates that only have PerRequest/PerImage/tier pricing are considered “pricing-present” and then scored as cheapest (0) in per-token routing. That produces wrong provider selection.

💡 Minimal fix
 func perMtokCost(c *ProviderCandidate) float64 {
 	if c == nil || c.Pricing == nil {
 		return -1
 	}
+	if c.Pricing.InputPerMtok == nil && c.Pricing.OutputPerMtok == nil {
+		return -1
+	}
 	var cost float64
 	if c.Pricing.InputPerMtok != nil {
 		cost += *c.Pricing.InputPerMtok
 	}
 	if c.Pricing.OutputPerMtok != nil {
 		cost += *c.Pricing.OutputPerMtok
 	}
 	return cost
 }
 
 func hasPricingData(candidates []ProviderCandidate) bool {
 	for i := range candidates {
 		c := &candidates[i]
-		if c.Pricing == nil {
-			continue
-		}
-		if c.Pricing.InputPerMtok != nil || c.Pricing.OutputPerMtok != nil ||
-			c.Pricing.CachedInputPerMtok != nil || c.Pricing.CacheWritePerMtok != nil ||
-			c.Pricing.PerRequest != nil || c.Pricing.PerImage != nil ||
-			c.Pricing.InputPerImage != nil || c.Pricing.PerSecondInput != nil ||
-			c.Pricing.PerSecondOutput != nil || c.Pricing.PerCharacterInput != nil ||
-			c.Pricing.PerPage != nil || len(c.Pricing.Tiers) > 0 {
+		if c.Pricing != nil &&
+			(c.Pricing.InputPerMtok != nil || c.Pricing.OutputPerMtok != nil) {
 			return true
 		}
 	}
 	return false
 }

Also applies to: 201-213

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/router/scoring.go` around lines 165 - 176, The perMtokCost function
currently returns an implicit 0 when Pricing exists but both InputPerMtok and
OutputPerMtok are nil, which incorrectly treats non-per-token pricing models as
free. Add a check after attempting to accumulate the token costs to return -1 if
neither InputPerMtok nor OutputPerMtok is present, ensuring unknown per-token
pricing is treated as unavailable rather than zero cost. Apply the same fix to
the related code at lines 201-213 that has similar logic.

Comment on lines +103 to +149
func TestCostOnly_PicksCheapest(t *testing.T) {
s := NewCostOnlyStrategy()
cands := candidates(
ProviderCandidate{ProviderName: "alpha", Pricing: pricing(5, 15), Latency: 10 * time.Millisecond},
ProviderCandidate{ProviderName: "zeta", Pricing: pricing(1, 4), Latency: 500 * time.Millisecond},
)
got := mustSelect(t, s, cands)
if got.ProviderName != "zeta" {
t.Fatalf("expected azure (cheapest), got %s", got.ProviderName)
}
}

func TestCostOnly_DegradesToLatencyWhenNoPricing(t *testing.T) {
s := NewCostOnlyStrategy()
cands := candidates(
ProviderCandidate{ProviderName: "alpha", Latency: 200 * time.Millisecond},
ProviderCandidate{ProviderName: "zeta", Latency: 50 * time.Millisecond},
)
got := mustSelect(t, s, cands)
if got.ProviderName != "zeta" {
t.Fatalf("expected azure (degraded to latency), got %s", got.ProviderName)
}
}

func TestLatencyOnly_PicksFastest(t *testing.T) {
s := NewLatencyOnlyStrategy()
cands := candidates(
ProviderCandidate{ProviderName: "alpha", Pricing: pricing(1, 1), Latency: 300 * time.Millisecond},
ProviderCandidate{ProviderName: "zeta", Pricing: pricing(50, 50), Latency: 50 * time.Millisecond},
)
got := mustSelect(t, s, cands)
if got.ProviderName != "zeta" {
t.Fatalf("expected azure (fastest), got %s", got.ProviderName)
}
}

func TestLatencyOnly_DegradesToCostWhenNoLatency(t *testing.T) {
s := NewLatencyOnlyStrategy()
cands := candidates(
ProviderCandidate{ProviderName: "alpha", Pricing: pricing(5, 15)},
ProviderCandidate{ProviderName: "zeta", Pricing: pricing(1, 4)},
)
got := mustSelect(t, s, cands)
if got.ProviderName != "zeta" {
t.Fatalf("expected azure (degraded to cost), got %s", got.ProviderName)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add regression coverage for “non-per-token pricing” candidates.

There’s no test asserting that candidates with pricing metadata but without InputPerMtok/OutputPerMtok are treated as unknown for cost scoring. Please add one to prevent cost-only/balanced regressions in this edge case.

✅ Suggested test case
+func TestCostOnly_DegradesToLatencyWhenNoPerTokenPricing(t *testing.T) {
+	s := NewCostOnlyStrategy()
+	cands := candidates(
+		ProviderCandidate{
+			ProviderName: "alpha",
+			Pricing:      &core.ModelPricing{PerRequest: floatPtr(1)}, // pricing exists, but not per-token
+			Latency:      200 * time.Millisecond,
+		},
+		ProviderCandidate{
+			ProviderName: "zeta",
+			Latency:      50 * time.Millisecond,
+		},
+	)
+	got := mustSelect(t, s, cands)
+	if got.ProviderName != "zeta" {
+		t.Fatalf("expected zeta (degraded to latency when per-token pricing absent), got %s", got.ProviderName)
+	}
+}

As per coding guidelines, "Add or update tests for behavior changes in Go."

Also applies to: 164-197

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/router/strategy_test.go` around lines 103 - 149, Add new test
functions to cover the edge case where candidates have pricing metadata but lack
InputPerMtok and OutputPerMtok values. Create test cases for both
TestCostOnly_DegradesToLatencyWhenPricingHasNoPerTokenCost and
TestLatencyOnly_DegradesToCostWhenPricingHasNoPerTokenCost that use the pricing
helper function but verify the strategies properly degrade to their fallback
metrics (latency for cost-only strategy, cost for latency-only strategy) when
per-token pricing values are absent. This ensures the selection logic correctly
treats incomplete pricing data as missing pricing for scoring purposes.

Source: Coding guidelines

Comment on lines +29 to +31
if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" {
ctx = router.WithStrategyOverride(ctx, strategy)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Normalize strategy override before storing it in context.

Line [29] only trims whitespace; case variants like Latency_Only are treated as invalid and silently dropped later. Normalize to lowercase here to keep override handling permissive.

Suggested minimal fix
-			if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" {
+			if strategy := strings.ToLower(strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader))); strategy != "" {
 				ctx = router.WithStrategyOverride(ctx, strategy)
 			}

As per coding guidelines, **/*.go should accept requests generously and adapt them; based on learnings, this should follow Postel’s Law for tolerant request parsing.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" {
ctx = router.WithStrategyOverride(ctx, strategy)
}
if strategy := strings.ToLower(strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader))); strategy != "" {
ctx = router.WithStrategyOverride(ctx, strategy)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/server/routing_strategy.go` around lines 29 - 31, After trimming
whitespace from the RoutingStrategyHeader using strings.TrimSpace, also
normalize the strategy value to lowercase before passing it to
router.WithStrategyOverride(). This ensures that case variants like Latency_Only
are properly handled and not silently dropped later due to case mismatches,
making the request parsing more permissive and following Postel's Law for
tolerant header handling.

Sources: Coding guidelines, Learnings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants