feat(router): intelligent cost-aware + latency-aware routing#414
Conversation
📝 WalkthroughWalkthroughAdds an intelligent routing subsystem to the model gateway. When a request does not pin a provider, a configurable strategy ( ChangesIntelligent Routing for Multi-Provider Gateway
Sequence DiagramsequenceDiagram
actor Client
participant Middleware as RoutingStrategyCapture
participant Prepare as inference_prepare.go
participant PRouter as providers/router.go
participant StratReg as StrategyRegistry
participant LLMClient as llmclient.Client
Client->>Middleware: HTTP request (optional X-GoModel-Routing-Strategy/Weights headers)
Middleware->>Middleware: parse headers, apply WithStrategyOverride / WithWeightsOverride to ctx
Middleware->>Prepare: next(ctx)
Prepare->>Prepare: detect no provider pin → WithRoutingEligible(ctx)
Prepare->>PRouter: ResolveModel(ctx, modelID)
PRouter->>PRouter: IsRoutingEligible(ctx)?
PRouter->>PRouter: collectCandidates → buildCandidatesFromEntries
Note over PRouter,LLMClient: ProviderStats (P50/P99/ErrorRate/CircuitState) read from each LLMClient
PRouter->>StratReg: Resolve(ctx) → strategy (override or default)
PRouter->>PRouter: applyWeightsOverride (clone BalancedStrategy if applicable)
PRouter->>StratReg: strategy.Select(ctx, candidates)
StratReg-->>PRouter: *ProviderCandidate (best scored)
PRouter-->>Prepare: resolved provider + selector
Prepare-->>Client: routed request forwarded to best provider
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Confidence Score: 3/5Several core routing paths can choose the wrong provider and should be fixed before merging.
internal/gateway/inference_prepare.go, internal/router/strategy_cost.go, internal/providers/router.go, config/router.go, internal/llmclient/latency.go Reviews (1): Last reviewed commit: "feat(router): add intelligent cost-aware..." | Re-trigger Greptile |
| func hasProviderInModel(model *string) bool { | ||
| if model == nil { | ||
| return false | ||
| } | ||
| m := strings.TrimSpace(*model) | ||
| firstSlash := strings.IndexByte(m, '/') | ||
| // A provider prefix is a non-empty segment before a slash that is NOT itself | ||
| // a plain model-ID containing a slash (e.g. "meta-llama/Llama-3-70b"). | ||
| return firstSlash > 0 && firstSlash < len(m)-1 |
There was a problem hiding this comment.
This treats any model containing a slash as provider-pinned, but valid unqualified model IDs can also contain slashes, such as meta-llama/Meta-Llama-3-70B. When such a request has an empty Provider, prepare skips WithRoutingEligible, then ApplyResolvedSelector stamps the first resolved provider into the request. The execution path then sees a non-empty provider hint and skips intelligent routing, so these multi-provider models keep the old first-wins behavior.
Context Used: CLAUDE.md (source)
| if _, max := minMaxCost(candidates); max == 0 { | ||
| latMin, latMax := minMaxLatency(candidates) | ||
| scores := normalizeLatency(latencyValues(candidates), latMin, latMax) | ||
| return rankBy(candidates, scores, maxErr) | ||
| } |
There was a problem hiding this comment.
This branch uses max == 0 to mean that no provider has pricing, but known-free models also produce a max cost of zero. With a free but slower provider and an unknown-priced faster provider, cost_only falls into latency ranking and can choose the unknown-priced provider over the known-free one. That makes the cost-only strategy ignore the very pricing sentinel the scoring code uses to distinguish free from unknown.
| eligible := strings.TrimSpace(providerHint) == "" || router.IsRoutingEligible(ctx) | ||
| if eligible && r.strategyRegistry != nil { | ||
| if chosen, newSelector, ok := r.applyStrategy(ctx, selector); ok { | ||
| selector = newSelector | ||
| p = chosen | ||
| } |
There was a problem hiding this comment.
Intelligent routing runs before endpoint capability checks, so an audio or realtime request can be rerouted to a cheaper or faster provider that serves the same model ID but does not implement the required optional interface. For example, CreateSpeech resolves through this path, then routeAudioCall type-asserts core.AudioProvider afterwards and returns model ... does not support audio operations. A model that worked through the first audio-capable provider can start failing because routing selected a provider that is only valid for other endpoints.
Context Used: CLAUDE.md (source)
| if cfg.MaxErrorRate < 0 || cfg.MaxErrorRate > 1 { | ||
| return fmt.Errorf("router.max_error_rate must be in [0, 1], got %v", cfg.MaxErrorRate) | ||
| } |
There was a problem hiding this comment.
The config validator accepts router.max_error_rate: 0 as a valid value, but each strategy later treats MaxErrorRate == 0 as “use the default 0.5”. An operator who configures zero to block any provider with a nonzero smoothed error rate will silently get the default threshold instead, so providers with error rates below 0.5 can still receive traffic.
Context Used: CLAUDE.md (source)
| } | ||
|
|
||
| // Record observes the duration and error status of a completed request. | ||
| // duration is the total request latency; isError indicates whether the | ||
| // request was considered failed for routing purposes. | ||
| func (t *LatencyTracker) Record(duration time.Duration, isError bool) { |
There was a problem hiding this comment.
duration.Milliseconds() truncates any latency below 1ms to zero, and the routing strategies treat zero latency as unknown data. For a very fast local or in-cluster provider, latency-only routing can therefore decide that no latency data exists and fall back to cost ranking, picking a slower provider even though the fast provider has real samples.
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@config/router.go`:
- Around line 80-88: The current config validation for router weights and
max_error_rate checks for negative and out-of-range values, but does not reject
NaN or other non-finite numbers that strconv.ParseFloat can produce. Add checks
using math.IsNaN() or math.IsInf() (or a helper function) to validate that
cfg.Weights.Cost, cfg.Weights.Latency, and cfg.MaxErrorRate are finite numbers.
Include these finite-value checks alongside the existing negative and range
validation checks in the error return statements, and apply the same validation
logic to the similar checks mentioned at lines 110-115.
In `@internal/gateway/inference_prepare.go`:
- Around line 231-243: The hasProviderInModel function currently returns true
for any model string containing a slash, but it should distinguish between
actual provider prefixes like "deepseek/deepseek-v4-flash" and plain model IDs
containing slashes like "meta-llama/Llama-3-70b". Update the return statement to
add additional logic beyond the current firstSlash check to properly identify
whether the segment before the slash is an actual provider prefix or just part
of a model ID name. This will ensure plain model IDs with slashes are not
incorrectly marked as provider-pinned and can properly use intelligent routing.
In `@internal/llmclient/latency_test.go`:
- Around line 28-37: Add a new regression test function (e.g.,
TestLatencyTracker_LeadingSuccessesDoNotSkewErrorRate) after the existing
TestLatencyTracker_ErrorRateReflectsFailures test that verifies ErrorRate()
behavior when multiple successful samples are recorded before a single failure
occurs. The test should create a NewLatencyTracker, record several successful
samples by calling Record with appropriate parameters indicating success (false
or 0), then record one failure, and finally assert that ErrorRate() returns a
value that does not spike to 1.0 but remains below the routing cutoff threshold,
demonstrating that a single failure after many successes does not incorrectly
skew the error rate calculation.
In `@internal/llmclient/latency.go`:
- Around line 15-19: The ewma struct uses e.value == 0 to detect the first
sample, which incorrectly handles valid zero-valued samples (such as error rate
of 0 for successful requests). Add a boolean field to the ewma struct to
explicitly track whether the EWMA has been initialized. Then update any methods
that check e.value == 0 as an initialization condition (mentioned in lines
30-38) to instead check this new boolean flag. This ensures the first sample is
properly initialized regardless of its value.
In `@internal/providers/registry.go`:
- Around line 717-725: The ListProvidersForModel method documentation states it
returns an empty slice when the model is not found, but the current
implementation returns nil instead. Replace the nil return statement in the
early exit condition (when modelID is empty or no providers are registered) with
an empty slice of type ModelWithProvider to honor the documented API contract
and eliminate nil-shape ambiguity.
In `@internal/providers/router_strategy_integration_test.go`:
- Around line 176-202: The test
TestRouter_IntelligentRouting_ExplicitProviderHintBypasses currently only
validates provider pinning through the explicit Provider field, but does not
cover provider pinning through the model syntax format. Add a second
ChatCompletion call within the same test that uses Model: "openai-west/gpt-4o"
with Provider left empty to verify that explicit provider pins specified via the
model-qualified syntax are also respected by the routing strategy and not
overridden based on cost. Assert that the response ID is still "west-resp" to
confirm the expensive provider is selected when pinned through the model syntax,
ensuring the strategy does not bypass such explicit provider qualifications.
In `@internal/providers/router.go`:
- Around line 309-317: The eligible variable calculation on line 315 only checks
if providerHint is empty but fails to account for explicit provider
specifications using the provider/model syntax. When a user pins a provider by
including it in the model string (e.g., "provider/model"), that explicit choice
should prevent re-routing through the strategy registry. Modify the eligible
variable assignment to also check whether the model or selector contains an
explicit provider specification, and only mark requests as eligible for
re-routing when no provider is pinned either through providerHint or through the
provider/model syntax in the selector.
In `@internal/router/context.go`:
- Around line 35-46: The WithStrategyOverride function currently stores empty
strings in the context, but the documentation states that empty names should
clear any prior override. Modify the WithStrategyOverride function to check if
the strategy parameter is empty and, if so, delete the value from the context
(or return the original context unchanged) rather than storing an empty string.
This ensures that StrategyOverrideFromContext returns the expected (empty
string, false) when no override is set, matching the documented behavior.
In `@internal/router/registry_test.go`:
- Around line 68-75: The TestParseWeights_Invalid test function is missing
coverage for edge cases like trailing garbage and non-finite float values.
Expand the cases slice in the TestParseWeights_Invalid function to include
additional invalid weight strings such as strings with trailing garbage (e.g.,
"0.6x,0.4"), and non-finite float representations (e.g., "NaN,0.4", "Inf,0.4").
This ensures the ParseWeights function properly handles and rejects these
malformed inputs as part of its error handling path, making the parser behavior
regression-proof.
In `@internal/router/registry.go`:
- Around line 98-106: The weight parsing logic in the function that handles Cost
and Latency weights uses fmt.Sscanf which has two issues: it accepts non-finite
values like NaN and Inf, and it performs only partial input matching (e.g.,
"0.6foo" would be parsed as 0.6). To fix this, after parsing each weight value
with fmt.Sscanf for both Cost and Latency, verify that the entire trimmed string
was consumed by the parse operation and reject any non-finite values (NaN or
Inf) using math.IsNaN and math.IsInf checks. Add these validation checks
immediately after each fmt.Sscanf call before the existing non-negative
validation.
In `@internal/router/strategy_balanced.go`:
- Around line 39-45: The zero-value defaulting logic for balanced weights in the
strategy initialization is incorrect because the first condition modifies costW
to 0.6, which then causes the second condition to fail since it checks if costW
== 0. Fix this by changing the second condition from checking costW == 0 to
checking s.CostWeight == 0 instead, so it evaluates the original unmodified
value. This ensures that when both s.CostWeight and s.LatencyWeight are zero,
both costW and latW are properly set to their default values of 0.6 and 0.4
respectively.
In `@internal/router/strategy_cost.go`:
- Around line 33-37: The current implementation in the pricing strategy uses
`max == 0` from minMaxCost() as a sentinel to detect missing pricing data, but
this incorrectly treats legitimate zero-cost pricing as absent pricing and
degrades to latency-only ranking. Create a new helper function (e.g.,
hasPricingData) that directly checks whether pricing information is actually
present for the candidates, rather than inferring presence from numeric extrema.
Replace the condition `if _, max := minMaxCost(candidates); max == 0` with a
call to this new helper function to properly distinguish between zero cost and
missing cost data.
In `@internal/router/strategy_test.go`:
- Around line 115-124: The TestCostOnly_DegradesToLatencyWhenNoPricing test can
still pass via lexicographic tie-break even if the degradation fallback logic
regresses, because both the correct degradation and the lexicographic ordering
would select the same provider. Change the provider names in this test (and the
similar degradation test mentioned in the also-applies-to comment around line
139-148) so that the provider with better latency/lower cost comes
lexicographically after the other one, making fallback and tie-break produce
different winners and ensuring the test truly validates the degradation logic.
Additionally, add a new test for zero-value BalancedStrategy that verifies it
correctly defaults to 0.6/0.4 weight distribution when initialized without
explicit configuration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8563b06f-d85e-4bbe-8889-05a534a1c750
📒 Files selected for processing (27)
.env.templateCLAUDE.mdconfig/config.example.yamlconfig/config.goconfig/router.gointernal/core/interfaces.gointernal/gateway/inference_prepare.gointernal/llmclient/client.gointernal/llmclient/latency.gointernal/llmclient/latency_test.gointernal/providers/init.gointernal/providers/registry.gointernal/providers/router.gointernal/providers/router_strategy_integration_test.gointernal/providers/strategy_init.gointernal/router/context.gointernal/router/registry.gointernal/router/registry_test.gointernal/router/scoring.gointernal/router/strategy.gointernal/router/strategy_balanced.gointernal/router/strategy_cost.gointernal/router/strategy_firstfit.gointernal/router/strategy_latency.gointernal/router/strategy_test.gointernal/server/http.gointernal/server/routing_strategy.go
| // hasProviderInModel reports whether the model string contains a provider | ||
| // prefix (e.g. "deepseek/deepseek-v4-flash"), which means the user explicitly | ||
| // chose a provider via the model syntax even though req.Provider is empty. | ||
| func hasProviderInModel(model *string) bool { | ||
| if model == nil { | ||
| return false | ||
| } | ||
| m := strings.TrimSpace(*model) | ||
| firstSlash := strings.IndexByte(m, '/') | ||
| // A provider prefix is a non-empty segment before a slash that is NOT itself | ||
| // a plain model-ID containing a slash (e.g. "meta-llama/Llama-3-70b"). | ||
| return firstSlash > 0 && firstSlash < len(m)-1 | ||
| } |
There was a problem hiding this comment.
hasProviderInModel misclassifies slash-style model IDs as provider-pinned.
Line 239 treats any x/y as provider-qualified, so plain model IDs containing / are marked as pinned and never get WithRoutingEligible. That skips intelligent routing for valid unpinned requests.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/gateway/inference_prepare.go` around lines 231 - 243, The
hasProviderInModel function currently returns true for any model string
containing a slash, but it should distinguish between actual provider prefixes
like "deepseek/deepseek-v4-flash" and plain model IDs containing slashes like
"meta-llama/Llama-3-70b". Update the return statement to add additional logic
beyond the current firstSlash check to properly identify whether the segment
before the slash is an actual provider prefix or just part of a model ID name.
This will ensure plain model IDs with slashes are not incorrectly marked as
provider-pinned and can properly use intelligent routing.
| func TestLatencyTracker_ErrorRateReflectsFailures(t *testing.T) { | ||
| tr := NewLatencyTracker() | ||
| for i := 0; i < 5; i++ { | ||
| tr.Record(10*time.Millisecond, i%2 == 0) // 3 errors of 5 | ||
| } | ||
| rate := tr.ErrorRate() | ||
| if rate < 0.3 || rate > 0.9 { | ||
| t.Fatalf("expected error rate near 0.4-0.6, got %v", rate) | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add a regression test for leading-success error-rate initialization.
Please add a case where several successful samples precede one failure; ErrorRate() should remain below the routing cutoff after a single failure, not jump to 1.0.
Suggested test
func TestLatencyTracker_ErrorRateReflectsFailures(t *testing.T) {
tr := NewLatencyTracker()
for i := 0; i < 5; i++ {
tr.Record(10*time.Millisecond, i%2 == 0) // 3 errors of 5
}
rate := tr.ErrorRate()
if rate < 0.3 || rate > 0.9 {
t.Fatalf("expected error rate near 0.4-0.6, got %v", rate)
}
}
+
+func TestLatencyTracker_ErrorRateLeadingSuccessesThenOneFailure(t *testing.T) {
+ tr := NewLatencyTracker()
+ for i := 0; i < 5; i++ {
+ tr.Record(10*time.Millisecond, false)
+ }
+ tr.Record(10*time.Millisecond, true)
+
+ if got := tr.ErrorRate(); got >= 0.5 {
+ t.Fatalf("expected error rate below 0.5 after one failure following successes, got %v", got)
+ }
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func TestLatencyTracker_ErrorRateReflectsFailures(t *testing.T) { | |
| tr := NewLatencyTracker() | |
| for i := 0; i < 5; i++ { | |
| tr.Record(10*time.Millisecond, i%2 == 0) // 3 errors of 5 | |
| } | |
| rate := tr.ErrorRate() | |
| if rate < 0.3 || rate > 0.9 { | |
| t.Fatalf("expected error rate near 0.4-0.6, got %v", rate) | |
| } | |
| } | |
| func TestLatencyTracker_ErrorRateReflectsFailures(t *testing.T) { | |
| tr := NewLatencyTracker() | |
| for i := 0; i < 5; i++ { | |
| tr.Record(10*time.Millisecond, i%2 == 0) // 3 errors of 5 | |
| } | |
| rate := tr.ErrorRate() | |
| if rate < 0.3 || rate > 0.9 { | |
| t.Fatalf("expected error rate near 0.4-0.6, got %v", rate) | |
| } | |
| } | |
| func TestLatencyTracker_ErrorRateLeadingSuccessesThenOneFailure(t *testing.T) { | |
| tr := NewLatencyTracker() | |
| for i := 0; i < 5; i++ { | |
| tr.Record(10*time.Millisecond, false) | |
| } | |
| tr.Record(10*time.Millisecond, true) | |
| if got := tr.ErrorRate(); got >= 0.5 { | |
| t.Fatalf("expected error rate below 0.5 after one failure following successes, got %v", got) | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/llmclient/latency_test.go` around lines 28 - 37, Add a new
regression test function (e.g.,
TestLatencyTracker_LeadingSuccessesDoNotSkewErrorRate) after the existing
TestLatencyTracker_ErrorRateReflectsFailures test that verifies ErrorRate()
behavior when multiple successful samples are recorded before a single failure
occurs. The test should create a NewLatencyTracker, record several successful
samples by calling Record with appropriate parameters indicating success (false
or 0), then record one failure, and finally assert that ErrorRate() returns a
value that does not spike to 1.0 but remains below the routing cutoff threshold,
demonstrating that a single failure after many successes does not incorrectly
skew the error rate calculation.
Review findings assessmentFixed
Assessed as not actionable
|
d33b07f to
dcfce08
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (2)
internal/llmclient/latency_test.go (1)
28-37: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd a regression test for “many successes, then one failure” error-rate behavior.
This specific path is still untested and is the one that previously regressed.
As per coding guidelines, `**/*_test.go` changes should include behavior-focused coverage for updated logic paths.Suggested test case
func TestLatencyTracker_ErrorRateReflectsFailures(t *testing.T) { tr := NewLatencyTracker() for i := 0; i < 5; i++ { tr.Record(10*time.Millisecond, i%2 == 0) // 3 errors of 5 } rate := tr.ErrorRate() if rate < 0.3 || rate > 0.9 { t.Fatalf("expected error rate near 0.4-0.6, got %v", rate) } } + +func TestLatencyTracker_ErrorRateLeadingSuccessesThenOneFailure(t *testing.T) { + tr := NewLatencyTracker() + for i := 0; i < 5; i++ { + tr.Record(10*time.Millisecond, false) + } + tr.Record(10*time.Millisecond, true) + if got := tr.ErrorRate(); got >= 0.5 { + t.Fatalf("expected error rate below 0.5 after one late failure, got %v", got) + } +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/llmclient/latency_test.go` around lines 28 - 37, Add a new regression test function (separate from TestLatencyTracker_ErrorRateReflectsFailures) to specifically test the "many successes followed by one failure" error-rate calculation path. This test should create a fresh LatencyTracker, record multiple successful calls (with the success parameter as true), then record a single failure, and verify that the error rate is calculated correctly for this specific scenario. This ensures the error rate logic handles the case of predominantly successful operations with a single failure at the end, which is the code path that previously regressed.Source: Coding guidelines
internal/router/context.go (1)
35-49:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEmpty strategy override does not actually clear inherited context override.
WithStrategyOverride(ctx, "")returnsctxunchanged, so any parent strategy override remains active despite the “empty clears override” contract. This can keep an unintended override in effect duringStrategyRegistry.Resolve.Suggested minimal fix
-import "context" +import ( + "context" + "strings" +) @@ func StrategyOverrideFromContext(ctx context.Context) (string, bool) { s, ok := ctx.Value(strategyCtxKey{}).(string) - return s, ok + if !ok { + return "", false + } + s = strings.TrimSpace(s) + if s == "" { + return "", false + } + return s, true } @@ func WithStrategyOverride(ctx context.Context, strategy string) context.Context { - if strategy == "" { - return ctx + strategy = strings.TrimSpace(strategy) + if strategy == "" { + // Shadow any inherited override as absent. + return context.WithValue(ctx, strategyCtxKey{}, nil) } return context.WithValue(ctx, strategyCtxKey{}, strategy) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/router/context.go` around lines 35 - 49, The WithStrategyOverride function returns the context unchanged when strategy is an empty string, which fails to clear inherited overrides from parent contexts. To fix this, remove the early return condition that checks if strategy is empty, and instead always execute the context.WithValue call regardless of whether strategy is empty or not. This ensures that an empty strategy value is properly set in the current context layer, effectively shadowing and overriding any parent context value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CLAUDE.md`:
- Line 134: The "Intelligent routing" documentation section on line 134 is a
single overly dense bullet containing multiple distinct concepts (routing
strategy defaults, configuration options, EWMA tracking details, header
overrides, and explicit provider behavior). Break this into 4-6 focused
sub-bullets, each addressing one specific topic: one for the routing strategy
configuration and defaults, one for the scoring weights, one for error rate
filtering, one for EWMA latency tracking mechanics, one for request header
overrides, and one for explicit provider-qualified request bypass behavior. This
will improve scannability and make it easier for operators to find and
understand each configuration option independently.
In `@config/router.go`:
- Around line 61-65: The empty-check for cfg.Strategy on line 61 occurs before
trimming and converting to lowercase on line 64, which means a whitespace-only
value like " " will pass the empty check and later fail validation instead of
defaulting to RouterStrategyBalanced. Move the strings.TrimSpace and
strings.ToLower operations before the empty-check for cfg.Strategy so that
whitespace-only values are properly caught and default to the balanced strategy
as intended.
In `@internal/llmclient/client.go`:
- Around line 231-234: The completeScope function has an inconsistency where the
latencyTracker.Record call uses err != nil to determine failure, but the circuit
breaker completion tracking uses cbErr. This causes routing error-rate telemetry
to miss failures when cbErr is set but err is nil. Update the
latencyTracker.Record call to check for failures consistently by accounting for
both err and cbErr, ensuring that any error tracked by the circuit breaker is
also reflected in the latency telemetry.
In `@internal/router/registry_test.go`:
- Around line 58-75: The TestParseWeights_Valid function only validates a single
canonical numeric format. Add additional test cases to this function to verify
that ParseWeights correctly handles equivalent numeric representations like
"0.70,0.30" and "7e-1,3e-1" which should all parse successfully and produce the
same Weight values (0.7 and 0.3 for Cost and Latency respectively). Update the
test to use a table-driven approach or add separate test cases for each format
variation to ensure regression-proof parser behavior.
In `@internal/router/registry.go`:
- Around line 101-113: The validation logic in the weight parsing code is overly
restrictive by comparing the original input strings (costStr and latStr) against
their formatted parsed values using fmt.Sprintf. This check rejects valid
numeric formats like 0.60 (which becomes 0.6) or scientific notation like 6e-1.
Since fmt.Sscanf has already successfully validated and parsed the input into a
float value, remove the string comparison checks that verify costStr matches
fmt.Sprintf("%v", cost) and latStr matches fmt.Sprintf("%v", lat). Keep only the
fmt.Sscanf validation and error handling, as it is sufficient to ensure the
input is a valid numeric weight.
In `@internal/server/routing_strategy.go`:
- Around line 33-35: The ParseWeights function in internal/router/registry.go is
performing canonical string matching after float parsing, which causes valid
decimal weight headers like "1.0,0.0" to be rejected. Modify the ParseWeights
function to be more tolerant of different valid decimal formats by normalizing
them during parsing instead of performing strict canonical matching. This aligns
with the principle to accept requests generously and adapt them according to
Postel's Law, allowing decimal values in various formats (like "1.0,0.0", "1,0",
etc.) to be accepted and properly parsed without silently ignoring valid input.
---
Duplicate comments:
In `@internal/llmclient/latency_test.go`:
- Around line 28-37: Add a new regression test function (separate from
TestLatencyTracker_ErrorRateReflectsFailures) to specifically test the "many
successes followed by one failure" error-rate calculation path. This test should
create a fresh LatencyTracker, record multiple successful calls (with the
success parameter as true), then record a single failure, and verify that the
error rate is calculated correctly for this specific scenario. This ensures the
error rate logic handles the case of predominantly successful operations with a
single failure at the end, which is the code path that previously regressed.
In `@internal/router/context.go`:
- Around line 35-49: The WithStrategyOverride function returns the context
unchanged when strategy is an empty string, which fails to clear inherited
overrides from parent contexts. To fix this, remove the early return condition
that checks if strategy is empty, and instead always execute the
context.WithValue call regardless of whether strategy is empty or not. This
ensures that an empty strategy value is properly set in the current context
layer, effectively shadowing and overriding any parent context value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 619e37f2-39dc-4d85-a0ff-6c5c614cbd21
📒 Files selected for processing (27)
.env.templateCLAUDE.mdconfig/config.example.yamlconfig/config.goconfig/router.gointernal/core/interfaces.gointernal/gateway/inference_prepare.gointernal/llmclient/client.gointernal/llmclient/latency.gointernal/llmclient/latency_test.gointernal/providers/init.gointernal/providers/registry.gointernal/providers/router.gointernal/providers/router_strategy_integration_test.gointernal/providers/strategy_init.gointernal/router/context.gointernal/router/registry.gointernal/router/registry_test.gointernal/router/scoring.gointernal/router/strategy.gointernal/router/strategy_balanced.gointernal/router/strategy_cost.gointernal/router/strategy_firstfit.gointernal/router/strategy_latency.gointernal/router/strategy_test.gointernal/server/http.gointernal/server/routing_strategy.go
| - **Guardrails:** Configured via `config/config.yaml` only (except `GUARDRAILS_ENABLED` env var) | ||
| - **Providers:** `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, `USE_GOOGLE_GEMINI_NATIVE_API` (true by default; false uses Gemini's OpenAI-compatible chat API), `XAI_API_KEY`, `GROQ_API_KEY`, `OPENROUTER_API_KEY`, `ZAI_API_KEY`, `ZAI_BASE_URL` (optional Z.ai endpoint override), `MINIMAX_API_KEY`, `MINIMAX_BASE_URL` (optional MiniMax endpoint override), `XIAOMI_API_KEY`, `XIAOMI_BASE_URL` (optional Xiaomi MiMo endpoint override), `OPENCODE_GO_API_KEY`, `OPENCODE_GO_BASE_URL` (optional OpenCode Go/Zen endpoint override; default `https://opencode.ai/zen/go/v1`), `OPENCODE_GO_MESSAGES_MODELS` (optional comma-separated model IDs routed to the Anthropic-native `/messages` endpoint instead of `/chat/completions`; default `qwen3.7-max`), `BAILIAN_API_KEY`, `BAILIAN_BASE_URL` (optional Bailian base URL for region switching; default `https://dashscope.aliyuncs.com/compatible-mode/v1`), `AZURE_API_KEY`, `AZURE_BASE_URL` (Azure OpenAI deployment base URL), `AZURE_API_VERSION` (optional Azure API version), `ORACLE_API_KEY` (Oracle API key), `ORACLE_BASE_URL` (Oracle OpenAI-compatible base URL), `<PROVIDER>[_SUFFIX]_MODELS` (comma-separated configured model list for any provider type), `OLLAMA_BASE_URL`, `VLLM_BASE_URL`, `VLLM_API_KEY` (optional upstream vLLM bearer token) | ||
| - **Provider model metadata:** `providers.<name>.models` accepts either model IDs (strings) or `{id, metadata}` objects. When `metadata` is supplied (`display_name`, `context_window`, `max_output_tokens`, `modes`, `capabilities`, `pricing`, …) it is merged onto the remote ai-model-list entry during enrichment, with operator values winning per-field. Primary use case: advertising context windows, capabilities, and pricing for local models (Ollama) and other custom endpoints whose IDs are not in the upstream registry. | ||
| - **Intelligent routing:** When a request does not pin a provider and multiple configured providers serve the same model ID, the gateway scores candidates and routes to the best one (single-provider configs are unaffected — one candidate means no decision). `MODEL_ROUTING_STRATEGY` (`balanced` default; also `cost_only`, `latency_only`, `first_fit`), `MODEL_ROUTING_STRATEGY_WEIGHTS` (`0.6,0.4` = cost_weight,latency_weight; balanced only), `MODEL_ROUTING_MAX_ERROR_RATE` (`0.5`; candidates at/above this smoothed error rate are filtered before scoring). Latency is tracked per provider as in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate), never persisted. Per-request overrides via request headers `X-GoModel-Routing-Strategy` and `X-GoModel-Routing-Weights` (`cost,latency`); invalid header values are silently dropped (WARN log) and fall back to the global strategy. Explicit provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing. |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Split this into short sub-bullets for scanability.
Line 134 packs defaults, runtime behavior, EWMA internals, and header override semantics into one dense line. Breaking it into 4–6 bullets will make operator usage much easier to parse.
Suggested structure
-- **Intelligent routing:** When a request does not pin a provider and multiple configured providers serve the same model ID, ...
+- **Intelligent routing:** enabled when request has no pinned provider and multiple providers serve the same model.
+ - **Defaults:** `MODEL_ROUTING_STRATEGY=balanced`, `MODEL_ROUTING_STRATEGY_WEIGHTS=0.6,0.4`, `MODEL_ROUTING_MAX_ERROR_RATE=0.5`
+ - **Filtering:** candidates at/above `max_error_rate` are excluded before scoring.
+ - **Latency stats:** in-memory EWMA only (not persisted).
+ - **Per-request overrides:** `X-GoModel-Routing-Strategy`, `X-GoModel-Routing-Weights`; invalid values are ignored with WARN logs.
+ - **Bypass rule:** provider-qualified requests like `openai-east/gpt-4o` bypass intelligent routing.As per coding guidelines, “Documentation should be concise, practical, and user-focused. Show defaults, explain when to change them, and include minimal examples when useful.”
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **Intelligent routing:** When a request does not pin a provider and multiple configured providers serve the same model ID, the gateway scores candidates and routes to the best one (single-provider configs are unaffected — one candidate means no decision). `MODEL_ROUTING_STRATEGY` (`balanced` default; also `cost_only`, `latency_only`, `first_fit`), `MODEL_ROUTING_STRATEGY_WEIGHTS` (`0.6,0.4` = cost_weight,latency_weight; balanced only), `MODEL_ROUTING_MAX_ERROR_RATE` (`0.5`; candidates at/above this smoothed error rate are filtered before scoring). Latency is tracked per provider as in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate), never persisted. Per-request overrides via request headers `X-GoModel-Routing-Strategy` and `X-GoModel-Routing-Weights` (`cost,latency`); invalid header values are silently dropped (WARN log) and fall back to the global strategy. Explicit provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing. | |
| - **Intelligent routing:** enabled when request has no pinned provider and multiple providers serve the same model. | |
| - **Defaults:** `MODEL_ROUTING_STRATEGY=balanced`, `MODEL_ROUTING_STRATEGY_WEIGHTS=0.6,0.4`, `MODEL_ROUTING_MAX_ERROR_RATE=0.5` | |
| - **Filtering:** candidates at/above `max_error_rate` are excluded before scoring. | |
| - **Latency stats:** in-memory EWMA only (not persisted). | |
| - **Per-request overrides:** `X-GoModel-Routing-Strategy`, `X-GoModel-Routing-Weights`; invalid values are ignored with WARN logs. | |
| - **Bypass rule:** provider-qualified requests like `openai-east/gpt-4o` bypass intelligent routing. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@CLAUDE.md` at line 134, The "Intelligent routing" documentation section on
line 134 is a single overly dense bullet containing multiple distinct concepts
(routing strategy defaults, configuration options, EWMA tracking details, header
overrides, and explicit provider behavior). Break this into 4-6 focused
sub-bullets, each addressing one specific topic: one for the routing strategy
configuration and defaults, one for the scoring weights, one for error rate
filtering, one for EWMA latency tracking mechanics, one for request header
overrides, and one for explicit provider-qualified request bypass behavior. This
will improve scannability and make it easier for operators to find and
understand each configuration option independently.
Source: Coding guidelines
| func (c *Client) completeScope(scope requestScope, statusCode int, err, cbErr error) { | ||
| c.recordCircuitBreakerCompletion(statusCode, cbErr) | ||
| c.latencyTracker.Record(time.Since(scope.startedAt), err != nil) | ||
| c.finishRequest(scope, statusCode, err) |
There was a problem hiding this comment.
Error-rate telemetry can miss failures tracked by circuit-breaker path.
completeScope records routing error EWMA with err != nil, but CB accounting uses cbErr. If cbErr is set while err is nil, routing error-rate remains artificially low.
Suggested minimal alignment
func (c *Client) completeScope(scope requestScope, statusCode int, err, cbErr error) {
c.recordCircuitBreakerCompletion(statusCode, cbErr)
- c.latencyTracker.Record(time.Since(scope.startedAt), err != nil)
+ c.latencyTracker.Record(time.Since(scope.startedAt), err != nil || cbErr != nil)
c.finishRequest(scope, statusCode, err)
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/llmclient/client.go` around lines 231 - 234, The completeScope
function has an inconsistency where the latencyTracker.Record call uses err !=
nil to determine failure, but the circuit breaker completion tracking uses
cbErr. This causes routing error-rate telemetry to miss failures when cbErr is
set but err is nil. Update the latencyTracker.Record call to check for failures
consistently by accounting for both err and cbErr, ensuring that any error
tracked by the circuit breaker is also reflected in the latency telemetry.
| func TestParseWeights_Valid(t *testing.T) { | ||
| w, err := ParseWeights("0.7,0.3") | ||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
| if w.Cost != 0.7 || w.Latency != 0.3 { | ||
| t.Fatalf("expected 0.7,0.3 got %v,%v", w.Cost, w.Latency) | ||
| } | ||
| } | ||
|
|
||
| func TestParseWeights_Invalid(t *testing.T) { | ||
| cases := []string{"0.6", "0.6,0.4,0.2", "abc,0.4", "-1,0.4", "0.6x,0.4", "NaN,0.4", "Inf,0.4"} | ||
| for _, c := range cases { | ||
| if _, err := ParseWeights(c); err == nil { | ||
| t.Errorf("expected error for %q", c) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add positive parsing cases for equivalent numeric formats.
TestParseWeights_Valid currently validates only one canonical representation. Please add accepted-format cases like 0.70,0.30 and 7e-1,3e-1 so parser behavior is regression-proof after the ParseWeights fix.
💡 Minimal test update
func TestParseWeights_Valid(t *testing.T) {
- w, err := ParseWeights("0.7,0.3")
- if err != nil {
- t.Fatalf("unexpected error: %v", err)
- }
- if w.Cost != 0.7 || w.Latency != 0.3 {
- t.Fatalf("expected 0.7,0.3 got %v,%v", w.Cost, w.Latency)
+ cases := []string{
+ "0.7,0.3",
+ "0.70,0.30",
+ "7e-1,3e-1",
+ }
+ for _, tc := range cases {
+ w, err := ParseWeights(tc)
+ if err != nil {
+ t.Fatalf("unexpected error for %q: %v", tc, err)
+ }
+ if w.Cost != 0.7 || w.Latency != 0.3 {
+ t.Fatalf("expected 0.7,0.3 for %q got %v,%v", tc, w.Cost, w.Latency)
+ }
}
}As per coding guidelines, tests should be updated for behavior changes and parser error-handling boundaries.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func TestParseWeights_Valid(t *testing.T) { | |
| w, err := ParseWeights("0.7,0.3") | |
| if err != nil { | |
| t.Fatalf("unexpected error: %v", err) | |
| } | |
| if w.Cost != 0.7 || w.Latency != 0.3 { | |
| t.Fatalf("expected 0.7,0.3 got %v,%v", w.Cost, w.Latency) | |
| } | |
| } | |
| func TestParseWeights_Invalid(t *testing.T) { | |
| cases := []string{"0.6", "0.6,0.4,0.2", "abc,0.4", "-1,0.4", "0.6x,0.4", "NaN,0.4", "Inf,0.4"} | |
| for _, c := range cases { | |
| if _, err := ParseWeights(c); err == nil { | |
| t.Errorf("expected error for %q", c) | |
| } | |
| } | |
| } | |
| func TestParseWeights_Valid(t *testing.T) { | |
| cases := []string{ | |
| "0.7,0.3", | |
| "0.70,0.30", | |
| "7e-1,3e-1", | |
| } | |
| for _, tc := range cases { | |
| w, err := ParseWeights(tc) | |
| if err != nil { | |
| t.Fatalf("unexpected error for %q: %v", tc, err) | |
| } | |
| if w.Cost != 0.7 || w.Latency != 0.3 { | |
| t.Fatalf("expected 0.7,0.3 for %q got %v,%v", tc, w.Cost, w.Latency) | |
| } | |
| } | |
| } | |
| func TestParseWeights_Invalid(t *testing.T) { | |
| cases := []string{"0.6", "0.6,0.4,0.2", "abc,0.4", "-1,0.4", "0.6x,0.4", "NaN,0.4", "Inf,0.4"} | |
| for _, c := range cases { | |
| if _, err := ParseWeights(c); err == nil { | |
| t.Errorf("expected error for %q", c) | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/router/registry_test.go` around lines 58 - 75, The
TestParseWeights_Valid function only validates a single canonical numeric
format. Add additional test cases to this function to verify that ParseWeights
correctly handles equivalent numeric representations like "0.70,0.30" and
"7e-1,3e-1" which should all parse successfully and produce the same Weight
values (0.7 and 0.3 for Cost and Latency respectively). Update the test to use a
table-driven approach or add separate test cases for each format variation to
ensure regression-proof parser behavior.
Source: Coding guidelines
| if weights := strings.TrimSpace(c.Request().Header.Get(RoutingWeightsHeader)); weights != "" { | ||
| if override, err := router.ParseWeights(weights); err != nil { | ||
| slog.WarnContext(ctx, "invalid routing weights header, ignoring", |
There was a problem hiding this comment.
Valid decimal weight headers are being rejected in the override path.
Line 34 depends on router.ParseWeights, which currently performs canonical string matching after float parsing; values like 1.0,0.0 get treated as invalid and silently ignored. That breaks common operator input and conflicts with the “accept requests generously” contract.
Suggested fix (in internal/router/registry.go)
-import "fmt"
+import (
+ "fmt"
+ "strconv"
+)
- n, err := fmt.Sscanf(costStr, "%f", &cost)
- if err != nil || n != 1 {
+ cost, err := strconv.ParseFloat(costStr, 64)
+ if err != nil {
return WeightsOverride{}, fmt.Errorf("invalid cost weight %q: parse error", costStr)
}
- if costStr != fmt.Sprintf("%v", cost) {
- return WeightsOverride{}, fmt.Errorf("invalid cost weight %q: trailing characters", parts[0])
- }
- n, err = fmt.Sscanf(latStr, "%f", &lat)
- if err != nil || n != 1 {
+ lat, err := strconv.ParseFloat(latStr, 64)
+ if err != nil {
return WeightsOverride{}, fmt.Errorf("invalid latency weight %q: parse error", latStr)
}
- if latStr != fmt.Sprintf("%v", lat) {
- return WeightsOverride{}, fmt.Errorf("invalid latency weight %q: trailing characters", parts[1])
- }As per coding guidelines, **/*.go should “Accept requests generously … and adapt them”; based on learnings, this should follow Postel’s Law for tolerant request parsing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/server/routing_strategy.go` around lines 33 - 35, The ParseWeights
function in internal/router/registry.go is performing canonical string matching
after float parsing, which causes valid decimal weight headers like "1.0,0.0" to
be rejected. Modify the ParseWeights function to be more tolerant of different
valid decimal formats by normalizing them during parsing instead of performing
strict canonical matching. This aligns with the principle to accept requests
generously and adapt them according to Postel's Law, allowing decimal values in
various formats (like "1.0,0.0", "1,0", etc.) to be accepted and properly parsed
without silently ignoring valid input.
Sources: Coding guidelines, Learnings
dcfce08 to
d07a10e
Compare
This feature enables the gateway to proactively select the best provider
when multiple providers serve the same model, combining cost and latency
signals into a weighted score. Previously the gateway used a deterministic
first-wins strategy: the first registered provider always handled the
request, regardless of cost or performance.
=== Architecture ===
- Pluggable RoutingStrategy interface (balanced, cost_only, latency_only,
first_fit) with extensible registry
- BalancedStrategy (default): weighted scoring (cost 0.6, latency 0.4),
lower score wins; missing data scores 1.0 (worst) to prefer
complete-data providers; ties broken by provider name lexicographic order
- CostOnlyStrategy / LatencyOnlyStrategy: single-factor ranking, degrade
to the other dimension when every candidate lacks their primary signal
(without the mutual-recursion bug that was identified and fixed)
- FirstFitStrategy: first acceptable candidate in registration order,
matching historical first-wins behaviour
=== Latency data collection ===
- EWMA-based LatencyTracker in llmclient.Client (in-memory, ~32 bytes per
provider, alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate, never persisted)
- ProviderStats optional interface exposed by llmclient.Client so the
router can read P50Latency, P99Latency, ErrorRate, and CircuitState
via a single type assertion
- completeScope() is the single Record() call site
=== Candidate selection ===
- collectCandidates(modelID) enumerates providers serving modelID via the
new O(k) ListProvidersForModel registry method (registration-order aware)
- Candidates with circuit open or error rate at/above 0.5 are filtered
before scoring. When every candidate is filtered the caller falls back
to the first candidate with a WARN log
- Explicit provider-qualified requests bypass intelligent routing;
single-provider configs are unaffected (one candidate, no decision)
=== Per-request overrides ===
- X-GoModel-Routing-Strategy and X-GoModel-Routing-Weights headers;
invalid values silently fall back (WARN log)
- RoutingEligible context flag set during prepare for unqualified requests
=== Config ===
- MODEL_ROUTING_STRATEGY, MODEL_ROUTING_STRATEGY_WEIGHTS,
MODEL_ROUTING_MAX_ERROR_RATE env vars; router block in config.yaml
=== Bugs fixed during development ===
1. Mutual recursion: cost/latency strategies recursed when both
dimensions absent. Fixed by computing fallback scores inline
2. Free model mis-classification: normalizeCost treated cost=0 as
unknown. Fixed with -1 sentinel
3. Production path never executed strategy: PrepareChatRequest set
req.Provider, resolveProvider skipped due to providerHint != "".
Fixed via RoutingEligible context flag
4. FirstFit used alpha-sorted order instead of registration order.
Fixed via ListProvidersForModel with registration-order tracking
5. BalancedStrategy zero-weight default used modified variable instead
of original, causing LatencyWeight to stay 0 when both weights were 0
6. CostOnly degraded when free model present (max==0 sentinel).
Fixed with explicit hasPricingData() helper
7. ParseWeights accepted NaN/Inf/trailing-garbage. Fixed with math
validation and full-string consumption check
8. Model-syntax provider ("pricey/gpt-4o") overridden when req.Provider
was empty. Fixed with slash-in-model check in resolveProvider
=== Testing ===
- 38+ unit tests covering every strategy, edge cases, registry,
context helpers, and EWMA convergence
- 9 integration tests covering cheaper/faster selection, header
overrides, invalid override fallback, explicit provider bypass,
single-candidate no-op, weights override, FirstFit registration
order, and model-syntax provider pinning
- lint: 0 issues, full project build passes
d07a10e to
8ea9477
Compare
|
Additional fix pushed:
All previous findings (balanced zero-weight bug, free pricing degradation, NaN/Inf validation, EWMA init flag, model-syntax provider override, etc.) remain fixed in the squashed commit. |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
coverage-bailian (1)
1-36:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove generated test coverage artifact from version control.
The
coverage-bailianfile is a Go test coverage profile generated at test time viago test -coverprofile=...(per Makefile line 51). Generated artifacts should never be committed to version control because:
- Causes constant diffs: Each test run produces different coverage data (different lines executed, percentages change).
- Obscures real changes: Coverage artifacts clutter PR diffs and hide actual code changes from review.
- Violates twelve-factor principles: Generated build outputs belong in CI/CD artifact storage, not git history.
- Risk of merge conflicts: Multiple developers/CI runs will cause conflicting coverage changes.
Action: Remove this file from the repository and add it to
.gitignorealongside the standardcoverage.outand other generated test artifacts. If coverage tracking is needed, store baselines in CI/CD systems or external coverage dashboards (Codecov, Coveralls, etc.), not git.[major_issue]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@coverage-bailian` around lines 1 - 36, Remove the coverage-bailian file from version control as it is a generated test coverage artifact that should not be committed. Additionally, add coverage-bailian to the .gitignore file to prevent it from being tracked in future commits, grouping it with other generated coverage files like coverage.out and similar test artifacts. Use git to remove the file from tracking while keeping it locally if needed for local testing purposes.Source: Learnings
♻️ Duplicate comments (1)
CLAUDE.md (1)
134-134: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winSplit this dense line into focused sub-bullets for scannability.
Line 134 packs routing strategy defaults, weight configuration, error-rate filtering, EWMA tracking mechanics, header overrides, and explicit provider bypass logic into one wrapped sentence. Breaking it into 4–6 sub-bullets will make operator usage much easier to understand and reference independently.
As per the coding guidelines, documentation "should be concise, practical, and user-focused." Scannable structure directly supports that goal.
Suggested structure
- **Intelligent routing:** When a request does not pin a provider and multiple configured providers serve the same model ID, the gateway scores candidates and routes to the best one (single-provider configs are unaffected — one candidate means no decision). `MODEL_ROUTING_STRATEGY` (`balanced` default; also `cost_only`, `latency_only`, `first_fit`), `MODEL_ROUTING_STRATEGY_WEIGHTS` (`0.6,0.4` = cost_weight,latency_weight; balanced only), `MODEL_ROUTING_MAX_ERROR_RATE` (`0.5`; candidates at/above this smoothed error rate are filtered before scoring). Latency is tracked per provider as in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate), never persisted. Per-request overrides via request headers `X-GoModel-Routing-Strategy` and `X-GoModel-Routing-Weights` (`cost,latency`); invalid header values are silently dropped (WARN log) and fall back to the global strategy. Explicit provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing. +- **Intelligent routing:** enabled when request has no pinned provider and multiple providers serve the same model (single-provider configs unaffected). + - **Configuration:** `MODEL_ROUTING_STRATEGY` (`balanced` default; also `cost_only`, `latency_only`, `first_fit`) + - **Balanced weights:** `MODEL_ROUTING_STRATEGY_WEIGHTS` as `cost_weight,latency_weight` (default `0.6,0.4`; balanced strategy only) + - **Filtering:** `MODEL_ROUTING_MAX_ERROR_RATE` (default `0.5`); candidates at/above this smoothed error rate are excluded before scoring + - **Latency stats:** in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate); not persisted + - **Per-request overrides:** `X-GoModel-Routing-Strategy` and `X-GoModel-Routing-Weights` headers; invalid values are WARN-logged and fall back to global strategy + - **Bypass rule:** provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CLAUDE.md` at line 134, The documentation for intelligent routing in CLAUDE.md contains a single very long sentence that covers multiple distinct configuration options and behaviors all at once, making it difficult for users to scan and reference individual features. Break this single dense bullet point into 4-6 separate focused sub-bullets, each covering one specific aspect: routing strategy options and defaults (MODEL_ROUTING_STRATEGY), weight configuration for balanced mode (MODEL_ROUTING_STRATEGY_WEIGHTS), error-rate filtering behavior (MODEL_ROUTING_MAX_ERROR_RATE), EWMA latency tracking mechanics, per-request header overrides and their fallback behavior, and explicit provider-qualified request bypass logic. Ensure each sub-bullet is concise and independently understandable while maintaining the technical accuracy of the original content.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/gateway/inference_prepare.go`:
- Around line 240-242: The comment preceding the return statement in this helper
function contradicts the actual behavior of the logic. The return condition
`firstSlash > 0 && firstSlash < len(m)-1` treats slash-style model IDs like
"meta-llama/Llama-3-70b" as provider-qualified (returns true), but the comment
claims these are not treated as provider-qualified. Update the comment to
accurately reflect that the function returns true when there is a valid provider
prefix followed by a model ID (i.e., when a slash exists that is neither at the
start nor at the end of the string).
In `@internal/llmclient/latency.go`:
- Around line 86-103: The Record method in LatencyTracker truncates
sub-millisecond latencies to zero when converting to milliseconds using
duration.Milliseconds(), which returns an integer. To preserve sub-millisecond
precision and avoid losing valid fast latency measurements, modify the
conversion to use floating point arithmetic without truncation. Convert the
duration to milliseconds as a float64 by either using duration.Seconds() and
multiplying by 1000, or by dividing duration.Nanoseconds() by 1e6, then pass
this precise value to the Add methods for t.p50 and t.p99.
In `@internal/providers/router_strategy_integration_test.go`:
- Line 75: The NewRouter constructor error is being discarded using the blank
identifier in the router initialization across multiple test locations. Replace
the blank identifier pattern with proper error handling by assigning the error
return value to a variable, then check if the error is not nil and immediately
fail the test using t.Fatalf with a descriptive message. Apply this same
error-checking pattern consistently at each NewRouter call site throughout the
integration test file to ensure setup failures are caught immediately rather
than causing misleading downstream assertion failures.
In `@internal/router/scoring.go`:
- Around line 165-176: The perMtokCost function currently returns an implicit 0
when Pricing exists but both InputPerMtok and OutputPerMtok are nil, which
incorrectly treats non-per-token pricing models as free. Add a check after
attempting to accumulate the token costs to return -1 if neither InputPerMtok
nor OutputPerMtok is present, ensuring unknown per-token pricing is treated as
unavailable rather than zero cost. Apply the same fix to the related code at
lines 201-213 that has similar logic.
In `@internal/router/strategy_test.go`:
- Around line 103-149: Add new test functions to cover the edge case where
candidates have pricing metadata but lack InputPerMtok and OutputPerMtok values.
Create test cases for both
TestCostOnly_DegradesToLatencyWhenPricingHasNoPerTokenCost and
TestLatencyOnly_DegradesToCostWhenPricingHasNoPerTokenCost that use the pricing
helper function but verify the strategies properly degrade to their fallback
metrics (latency for cost-only strategy, cost for latency-only strategy) when
per-token pricing values are absent. This ensures the selection logic correctly
treats incomplete pricing data as missing pricing for scoring purposes.
In `@internal/server/routing_strategy.go`:
- Around line 29-31: After trimming whitespace from the RoutingStrategyHeader
using strings.TrimSpace, also normalize the strategy value to lowercase before
passing it to router.WithStrategyOverride(). This ensures that case variants
like Latency_Only are properly handled and not silently dropped later due to
case mismatches, making the request parsing more permissive and following
Postel's Law for tolerant header handling.
---
Outside diff comments:
In `@coverage-bailian`:
- Around line 1-36: Remove the coverage-bailian file from version control as it
is a generated test coverage artifact that should not be committed.
Additionally, add coverage-bailian to the .gitignore file to prevent it from
being tracked in future commits, grouping it with other generated coverage files
like coverage.out and similar test artifacts. Use git to remove the file from
tracking while keeping it locally if needed for local testing purposes.
---
Duplicate comments:
In `@CLAUDE.md`:
- Line 134: The documentation for intelligent routing in CLAUDE.md contains a
single very long sentence that covers multiple distinct configuration options
and behaviors all at once, making it difficult for users to scan and reference
individual features. Break this single dense bullet point into 4-6 separate
focused sub-bullets, each covering one specific aspect: routing strategy options
and defaults (MODEL_ROUTING_STRATEGY), weight configuration for balanced mode
(MODEL_ROUTING_STRATEGY_WEIGHTS), error-rate filtering behavior
(MODEL_ROUTING_MAX_ERROR_RATE), EWMA latency tracking mechanics, per-request
header overrides and their fallback behavior, and explicit provider-qualified
request bypass logic. Ensure each sub-bullet is concise and independently
understandable while maintaining the technical accuracy of the original content.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 381fa4cf-069d-4358-b9be-aa1adcfa84bb
⛔ Files ignored due to path filters (3)
c.outis excluded by!**/*.outcoverage-bailian.outis excluded by!**/*.outgomodel.exeis excluded by!**/*.exe
📒 Files selected for processing (36)
.env.template.gitignore.idea/.gitignore.idea/GoModel.iml.idea/go.imports.xml.idea/golinter.xml.idea/modules.xml.idea/vcs.xmlCLAUDE.mdconfig/config.example.yamlconfig/config.goconfig/router.gocoverage-bailiangomodel.exe~internal/core/interfaces.gointernal/gateway/inference_prepare.gointernal/llmclient/client.gointernal/llmclient/latency.gointernal/llmclient/latency_test.gointernal/providers/init.gointernal/providers/registry.gointernal/providers/router.gointernal/providers/router_strategy_integration_test.gointernal/providers/strategy_init.gointernal/router/context.gointernal/router/registry.gointernal/router/registry_test.gointernal/router/scoring.gointernal/router/strategy.gointernal/router/strategy_balanced.gointernal/router/strategy_cost.gointernal/router/strategy_firstfit.gointernal/router/strategy_latency.gointernal/router/strategy_test.gointernal/server/http.gointernal/server/routing_strategy.go
| // A provider prefix is a non-empty segment before a slash that is NOT itself | ||
| // a plain model-ID containing a slash (e.g. "meta-llama/Llama-3-70b"). | ||
| return firstSlash > 0 && firstSlash < len(m)-1 |
There was a problem hiding this comment.
Comment contradicts the helper’s actual behavior.
The comment says slash-style model IDs like meta-llama/Llama-3-70b are not treated as provider-qualified, but the return condition does treat that shape as true. Please align the comment with the implemented semantics (or adjust the logic if that comment reflects the real intent).
✏️ Minimal comment-only fix
- // A provider prefix is a non-empty segment before a slash that is NOT itself
- // a plain model-ID containing a slash (e.g. "meta-llama/Llama-3-70b").
+ // Treat any non-edge "x/y" shape as provider-qualified model syntax.
+ // This intentionally includes slash-style model IDs.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/gateway/inference_prepare.go` around lines 240 - 242, The comment
preceding the return statement in this helper function contradicts the actual
behavior of the logic. The return condition `firstSlash > 0 && firstSlash <
len(m)-1` treats slash-style model IDs like "meta-llama/Llama-3-70b" as
provider-qualified (returns true), but the comment claims these are not treated
as provider-qualified. Update the comment to accurately reflect that the
function returns true when there is a valid provider prefix followed by a model
ID (i.e., when a slash exists that is neither at the start nor at the end of the
string).
| func (t *LatencyTracker) Record(duration time.Duration, isError bool) { | ||
| ms := float64(duration.Milliseconds()) | ||
| t.p50.Add(ms) | ||
| t.p99.Add(ms) | ||
| t.errorRate.Add(boolToFloat64(isError)) | ||
| } | ||
|
|
||
| // P50 returns the approximate p50 latency as a time.Duration. | ||
| // Returns zero before any sample has been recorded. | ||
| func (t *LatencyTracker) P50() time.Duration { | ||
| return time.Duration(t.p50.Value()) * time.Millisecond | ||
| } | ||
|
|
||
| // P99 returns the approximate p99 latency as a time.Duration. | ||
| // Returns zero before any sample has been recorded. | ||
| func (t *LatencyTracker) P99() time.Duration { | ||
| return time.Duration(t.p99.Value()) * time.Millisecond | ||
| } |
There was a problem hiding this comment.
Sub-millisecond latency samples are collapsed to “unknown.”
Line 87 truncates to whole milliseconds, so valid <1ms samples become 0; with routing semantics, that can be interpreted as missing latency data instead of observed fast latency.
Suggested fix
func (t *LatencyTracker) Record(duration time.Duration, isError bool) {
- ms := float64(duration.Milliseconds())
+ ms := float64(duration) / float64(time.Millisecond)
t.p50.Add(ms)
t.p99.Add(ms)
t.errorRate.Add(boolToFloat64(isError))
}
@@
func (t *LatencyTracker) P50() time.Duration {
- return time.Duration(t.p50.Value()) * time.Millisecond
+ return time.Duration(t.p50.Value() * float64(time.Millisecond))
}
@@
func (t *LatencyTracker) P99() time.Duration {
- return time.Duration(t.p99.Value()) * time.Millisecond
+ return time.Duration(t.p99.Value() * float64(time.Millisecond))
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/llmclient/latency.go` around lines 86 - 103, The Record method in
LatencyTracker truncates sub-millisecond latencies to zero when converting to
milliseconds using duration.Milliseconds(), which returns an integer. To
preserve sub-millisecond precision and avoid losing valid fast latency
measurements, modify the conversion to use floating point arithmetic without
truncation. Convert the duration to milliseconds as a float64 by either using
duration.Seconds() and multiplying by 1000, or by dividing
duration.Nanoseconds() by 1e6, then pass this precise value to the Add methods
for t.p50 and t.p99.
| registerModelWithMetadata(t, registry, pricey, "openai-west", "openai", "gpt-4o", | ||
| &core.ModelPricing{InputPerMtok: floatPtr(10), OutputPerMtok: floatPtr(20)}) | ||
|
|
||
| r, _ := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry())) |
There was a problem hiding this comment.
Stop discarding NewRouter errors in integration tests.
These call sites ignore constructor errors, which can hide setup failures and produce misleading downstream assertions. Capture and fail immediately.
Suggested pattern (apply at each call site)
- r, _ := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry()))
+ r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry()))
+ if err != nil {
+ t.Fatalf("unexpected router construction error: %v", err)
+ }As per coding guidelines, **/*_test.go should cover error handling for behavior changes.
Also applies to: 101-101, 128-128, 163-163, 189-189, 211-211, 237-237, 275-275, 303-303
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/providers/router_strategy_integration_test.go` at line 75, The
NewRouter constructor error is being discarded using the blank identifier in the
router initialization across multiple test locations. Replace the blank
identifier pattern with proper error handling by assigning the error return
value to a variable, then check if the error is not nil and immediately fail the
test using t.Fatalf with a descriptive message. Apply this same error-checking
pattern consistently at each NewRouter call site throughout the integration test
file to ensure setup failures are caught immediately rather than causing
misleading downstream assertion failures.
Source: Coding guidelines
| func perMtokCost(c *ProviderCandidate) float64 { | ||
| if c == nil || c.Pricing == nil { | ||
| return -1 | ||
| } | ||
| var cost float64 | ||
| if c.Pricing.InputPerMtok != nil { | ||
| cost += *c.Pricing.InputPerMtok | ||
| } | ||
| if c.Pricing.OutputPerMtok != nil { | ||
| cost += *c.Pricing.OutputPerMtok | ||
| } | ||
| return cost |
There was a problem hiding this comment.
Treat non-per-token pricing as unknown instead of free.
Line 165 currently returns 0 when Pricing exists but both InputPerMtok and OutputPerMtok are nil. Combined with Line 201, candidates that only have PerRequest/PerImage/tier pricing are considered “pricing-present” and then scored as cheapest (0) in per-token routing. That produces wrong provider selection.
💡 Minimal fix
func perMtokCost(c *ProviderCandidate) float64 {
if c == nil || c.Pricing == nil {
return -1
}
+ if c.Pricing.InputPerMtok == nil && c.Pricing.OutputPerMtok == nil {
+ return -1
+ }
var cost float64
if c.Pricing.InputPerMtok != nil {
cost += *c.Pricing.InputPerMtok
}
if c.Pricing.OutputPerMtok != nil {
cost += *c.Pricing.OutputPerMtok
}
return cost
}
func hasPricingData(candidates []ProviderCandidate) bool {
for i := range candidates {
c := &candidates[i]
- if c.Pricing == nil {
- continue
- }
- if c.Pricing.InputPerMtok != nil || c.Pricing.OutputPerMtok != nil ||
- c.Pricing.CachedInputPerMtok != nil || c.Pricing.CacheWritePerMtok != nil ||
- c.Pricing.PerRequest != nil || c.Pricing.PerImage != nil ||
- c.Pricing.InputPerImage != nil || c.Pricing.PerSecondInput != nil ||
- c.Pricing.PerSecondOutput != nil || c.Pricing.PerCharacterInput != nil ||
- c.Pricing.PerPage != nil || len(c.Pricing.Tiers) > 0 {
+ if c.Pricing != nil &&
+ (c.Pricing.InputPerMtok != nil || c.Pricing.OutputPerMtok != nil) {
return true
}
}
return false
}Also applies to: 201-213
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/router/scoring.go` around lines 165 - 176, The perMtokCost function
currently returns an implicit 0 when Pricing exists but both InputPerMtok and
OutputPerMtok are nil, which incorrectly treats non-per-token pricing models as
free. Add a check after attempting to accumulate the token costs to return -1 if
neither InputPerMtok nor OutputPerMtok is present, ensuring unknown per-token
pricing is treated as unavailable rather than zero cost. Apply the same fix to
the related code at lines 201-213 that has similar logic.
| func TestCostOnly_PicksCheapest(t *testing.T) { | ||
| s := NewCostOnlyStrategy() | ||
| cands := candidates( | ||
| ProviderCandidate{ProviderName: "alpha", Pricing: pricing(5, 15), Latency: 10 * time.Millisecond}, | ||
| ProviderCandidate{ProviderName: "zeta", Pricing: pricing(1, 4), Latency: 500 * time.Millisecond}, | ||
| ) | ||
| got := mustSelect(t, s, cands) | ||
| if got.ProviderName != "zeta" { | ||
| t.Fatalf("expected azure (cheapest), got %s", got.ProviderName) | ||
| } | ||
| } | ||
|
|
||
| func TestCostOnly_DegradesToLatencyWhenNoPricing(t *testing.T) { | ||
| s := NewCostOnlyStrategy() | ||
| cands := candidates( | ||
| ProviderCandidate{ProviderName: "alpha", Latency: 200 * time.Millisecond}, | ||
| ProviderCandidate{ProviderName: "zeta", Latency: 50 * time.Millisecond}, | ||
| ) | ||
| got := mustSelect(t, s, cands) | ||
| if got.ProviderName != "zeta" { | ||
| t.Fatalf("expected azure (degraded to latency), got %s", got.ProviderName) | ||
| } | ||
| } | ||
|
|
||
| func TestLatencyOnly_PicksFastest(t *testing.T) { | ||
| s := NewLatencyOnlyStrategy() | ||
| cands := candidates( | ||
| ProviderCandidate{ProviderName: "alpha", Pricing: pricing(1, 1), Latency: 300 * time.Millisecond}, | ||
| ProviderCandidate{ProviderName: "zeta", Pricing: pricing(50, 50), Latency: 50 * time.Millisecond}, | ||
| ) | ||
| got := mustSelect(t, s, cands) | ||
| if got.ProviderName != "zeta" { | ||
| t.Fatalf("expected azure (fastest), got %s", got.ProviderName) | ||
| } | ||
| } | ||
|
|
||
| func TestLatencyOnly_DegradesToCostWhenNoLatency(t *testing.T) { | ||
| s := NewLatencyOnlyStrategy() | ||
| cands := candidates( | ||
| ProviderCandidate{ProviderName: "alpha", Pricing: pricing(5, 15)}, | ||
| ProviderCandidate{ProviderName: "zeta", Pricing: pricing(1, 4)}, | ||
| ) | ||
| got := mustSelect(t, s, cands) | ||
| if got.ProviderName != "zeta" { | ||
| t.Fatalf("expected azure (degraded to cost), got %s", got.ProviderName) | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add regression coverage for “non-per-token pricing” candidates.
There’s no test asserting that candidates with pricing metadata but without InputPerMtok/OutputPerMtok are treated as unknown for cost scoring. Please add one to prevent cost-only/balanced regressions in this edge case.
✅ Suggested test case
+func TestCostOnly_DegradesToLatencyWhenNoPerTokenPricing(t *testing.T) {
+ s := NewCostOnlyStrategy()
+ cands := candidates(
+ ProviderCandidate{
+ ProviderName: "alpha",
+ Pricing: &core.ModelPricing{PerRequest: floatPtr(1)}, // pricing exists, but not per-token
+ Latency: 200 * time.Millisecond,
+ },
+ ProviderCandidate{
+ ProviderName: "zeta",
+ Latency: 50 * time.Millisecond,
+ },
+ )
+ got := mustSelect(t, s, cands)
+ if got.ProviderName != "zeta" {
+ t.Fatalf("expected zeta (degraded to latency when per-token pricing absent), got %s", got.ProviderName)
+ }
+}As per coding guidelines, "Add or update tests for behavior changes in Go."
Also applies to: 164-197
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/router/strategy_test.go` around lines 103 - 149, Add new test
functions to cover the edge case where candidates have pricing metadata but lack
InputPerMtok and OutputPerMtok values. Create test cases for both
TestCostOnly_DegradesToLatencyWhenPricingHasNoPerTokenCost and
TestLatencyOnly_DegradesToCostWhenPricingHasNoPerTokenCost that use the pricing
helper function but verify the strategies properly degrade to their fallback
metrics (latency for cost-only strategy, cost for latency-only strategy) when
per-token pricing values are absent. This ensures the selection logic correctly
treats incomplete pricing data as missing pricing for scoring purposes.
Source: Coding guidelines
| if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" { | ||
| ctx = router.WithStrategyOverride(ctx, strategy) | ||
| } |
There was a problem hiding this comment.
Normalize strategy override before storing it in context.
Line [29] only trims whitespace; case variants like Latency_Only are treated as invalid and silently dropped later. Normalize to lowercase here to keep override handling permissive.
Suggested minimal fix
- if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" {
+ if strategy := strings.ToLower(strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader))); strategy != "" {
ctx = router.WithStrategyOverride(ctx, strategy)
}As per coding guidelines, **/*.go should accept requests generously and adapt them; based on learnings, this should follow Postel’s Law for tolerant request parsing.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" { | |
| ctx = router.WithStrategyOverride(ctx, strategy) | |
| } | |
| if strategy := strings.ToLower(strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader))); strategy != "" { | |
| ctx = router.WithStrategyOverride(ctx, strategy) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/server/routing_strategy.go` around lines 29 - 31, After trimming
whitespace from the RoutingStrategyHeader using strings.TrimSpace, also
normalize the strategy value to lowercase before passing it to
router.WithStrategyOverride(). This ensures that case variants like Latency_Only
are properly handled and not silently dropped later due to case mismatches,
making the request parsing more permissive and following Postel's Law for
tolerant header handling.
Sources: Coding guidelines, Learnings
Summary
When multiple providers serve the same model ID, the gateway now proactively selects the best provider based on a weighted cost + latency score — instead of the previous deterministic first-wins strategy where the first registered provider always handled the request.
Architecture
New components (8 files)
Routing strategies under
internal/router/:strategy.gostrategy_balanced.gostrategy_cost.gostrategy_latency.gostrategy_firstfit.goscoring.goregistry.gocontext.goSupporting:
internal/llmclient/latency.gointernal/core/interfaces.go(+ProviderStats)internal/server/routing_strategy.goconfig/router.goModified components
llmclient/client.goproviders/registry.goproviders/router.goproviders/init.goproviders/strategy_init.gogateway/inference_prepare.goconfig/config.goconfig/config.example.yaml.env.templateCLAUDE.mdKey design decisions
X-GoModel-Routing-StrategyandX-GoModel-Routing-Weights; invalid values silently fall back (WARN log)openai-east/gpt-4oorreq.Providernon-empty leads to exact route, no strategy involvementBugs fixed during development
Testing
Summary by CodeRabbit
router:configuration and documented the related environment/header options; updated example configuration and templates.