Skip to content

feat: add support for retrying LLM requests on 429 ratelimits (#697)#733

Open
raheelshahzad wants to merge 3 commits intokatanemo:mainfrom
raheelshahzad:feat/retry-on-ratelimit
Open

feat: add support for retrying LLM requests on 429 ratelimits (#697)#733
raheelshahzad wants to merge 3 commits intokatanemo:mainfrom
raheelshahzad:feat/retry-on-ratelimit

Conversation

@raheelshahzad
Copy link

@raheelshahzad raheelshahzad commented Feb 10, 2026

  • Added 'retry_on_ratelimit' configuration to LlmProvider.
  • Implemented a retry loop in the LLM handler to automatically failover to an alternative model when a 429 status is received.
  • Added comprehensive unit tests for fallback selection and failover logic.
  • Ensured default behavior is unchanged when the feature is disabled.

fixes #697

Copy link
Contributor

@adilhafeez adilhafeez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for putting this change together @raheelshahzad . Please join our discord channel too. Overall looks good!

Left some comments in the PR and have some additional suggestions/comments on overall change,

  • we should do exponential backoff on retries
  • how do we ensure that we have not reached request timeout
  • max_retries should be defined somewhere in config.yaml probably not in this PR but we should let developers define that var
  • this code change needs an update to docs
  • I think we should allow retry to same provider or at least let developers define if they want to retry to different provider. Consider following example,
model_providers:
  - model: openai/gpt-4o
    base_url: https://dsna-oai.openai.azure.com
    access_key: $OPENAI_API_KEY
    retry_on_ratelimit: true # new feature
    retry_to_same_provider: true # this flag should only allow retry to same provider otherwise we should retry randomly to all models

  - model: openai/gpt-5
    base_url: https://dsna-oai.openai.azure.com
    access_key: $OPENAI_API_KEY

Comment on lines 95 to 104
self.providers.iter().find_map(|(key, provider)| {
if provider.internal != Some(true)
&& provider.name != current_name
&& key == &provider.name
{
Some(Arc::clone(provider))
} else {
None
}
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should pick random model

Comment on lines 403 to 419
if res.status() == StatusCode::TOO_MANY_REQUESTS && attempts < max_attempts {
let providers = llm_providers.read().await;
if let Some(provider) = providers.get(&current_resolved_model) {
if provider.retry_on_ratelimit == Some(true) {
if let Some(alt_provider) = providers.get_alternative(&current_resolved_model) {
info!(
request_id = %request_id,
current_model = %current_resolved_model,
alt_model = %alt_provider.name,
"429 received, retrying with alternative model"
);
current_resolved_model = alt_provider.name.clone();
continue;
}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add exponential backoff

let mut current_resolved_model = resolved_model.clone();
let mut current_client_request = client_request;
let mut attempts = 0;
let max_attempts = 2; // Original + 1 retry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be configurable

);
// Capture start time right before sending request to upstream
let request_start_time = std::time::Instant::now();
let _request_start_system_time = std::time::SystemTime::now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code?

@adilhafeez
Copy link
Contributor

adilhafeez commented Feb 10, 2026

I looked through envoy retry semantics https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-field-config-route-v3-routeaction-retry-policy

I think we should lean toward this design for retries. We don't have to implement this completely but we should implement bare minimal but following similar semantics / config, thoughts?

…mo#697)

- Added 'retry_on_ratelimit' configuration to LlmProvider.
- Implemented a retry loop in the LLM handler to automatically failover to an alternative model when a 429 status is received.
- Added comprehensive unit tests for fallback selection and failover logic.
- Ensured default behavior is unchanged when the feature is disabled.
@raheelshahzad raheelshahzad force-pushed the feat/retry-on-ratelimit branch from d1aa3ac to ca903d2 Compare February 12, 2026 04:08
Copy link
Author

@raheelshahzad raheelshahzad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Exponential backoff with configurable base and max intervals.
  2. Configurable max_retries.
  3. retry_to_same_provider option.
  4. Random alternative selection when failing over to a different model.
  5. Documentation updates in the reference configuration.
  6. Comprehensive unit tests for all the above.

@adilhafeez
Copy link
Contributor

adilhafeez commented Feb 12, 2026

Thanks a lot Raheel for continuing to make plano better. We are getting there.

This may be a slightly better way to specify retries,

  model_providers:
    - model: openai/gpt-4o
      access_key: $OPENAI_API_KEY
      default: true
      retry_policy:
        num_retries: 2
        # retry_on: [429]             # default
        # back_off:
        #   base_interval: 25ms       # default
        #   max_interval: 250ms       # default (10x base)
        # failover:
        #   strategy: same_provider   # default

    # Need more control
    - model: anthropic/claude-sonnet-4-0
      access_key: $ANTHROPIC_API_KEY
      retry_policy:
        num_retries: 3
        failover:
          strategy: any

    # Full control
    - model: openai/gpt-4o-mini
      access_key: $OPENAI_API_KEY
      retry_policy:
        num_retries: 2
        retry_on: [429, 503]
        back_off:
          base_interval: 100ms
          max_interval: 2000ms
        failover:
          providers:
            - anthropic/claude-sonnet-4-0

    # No retries (default, just omit retry_policy)
    - model: mistral/ministral-3b-latest
      access_key: $MISTRAL_API_KEY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ability to retry to other model if 429 is received

2 participants