feat: add support for retrying LLM requests on 429 ratelimits (#697) by raheelshahzad · Pull Request #733 · katanemo/plano

raheelshahzad · 2026-02-10T05:00:28Z

Added 'retry_on_ratelimit' configuration to LlmProvider.
Implemented a retry loop in the LLM handler to automatically failover to an alternative model when a 429 status is received.
Added comprehensive unit tests for fallback selection and failover logic.
Ensured default behavior is unchanged when the feature is disabled.

fixes #697

adilhafeez

Thanks a lot for putting this change together @raheelshahzad . Please join our discord channel too. Overall looks good!

Left some comments in the PR and have some additional suggestions/comments on overall change,

we should do exponential backoff on retries
how do we ensure that we have not reached request timeout
max_retries should be defined somewhere in config.yaml probably not in this PR but we should let developers define that var
this code change needs an update to docs
I think we should allow retry to same provider or at least let developers define if they want to retry to different provider. Consider following example,

model_providers:
  - model: openai/gpt-4o
    base_url: https://dsna-oai.openai.azure.com
    access_key: $OPENAI_API_KEY
    retry_on_ratelimit: true # new feature
    retry_to_same_provider: true # this flag should only allow retry to same provider otherwise we should retry randomly to all models

  - model: openai/gpt-5
    base_url: https://dsna-oai.openai.azure.com
    access_key: $OPENAI_API_KEY

adilhafeez · 2026-02-10T07:41:40Z

crates/common/src/llm_providers.rs

+        self.providers.iter().find_map(|(key, provider)| {
+            if provider.internal != Some(true)
+                && provider.name != current_name
+                && key == &provider.name
+            {
+                Some(Arc::clone(provider))
+            } else {
+                None
+            }
+        })


should pick random model

adilhafeez · 2026-02-10T07:42:51Z

crates/brightstaff/src/handlers/llm.rs

+        if res.status() == StatusCode::TOO_MANY_REQUESTS && attempts < max_attempts {
+            let providers = llm_providers.read().await;
+            if let Some(provider) = providers.get(&current_resolved_model) {
+                if provider.retry_on_ratelimit == Some(true) {
+                    if let Some(alt_provider) = providers.get_alternative(&current_resolved_model) {
+                        info!(
+                            request_id = %request_id,
+                            current_model = %current_resolved_model,
+                            alt_model = %alt_provider.name,
+                            "429 received, retrying with alternative model"
+                        );
+                        current_resolved_model = alt_provider.name.clone();
+                        continue;
+                    }
+                }
+            }
        }


we need to add exponential backoff

adilhafeez · 2026-02-10T07:46:46Z

crates/brightstaff/src/handlers/llm.rs

+    let mut current_resolved_model = resolved_model.clone();
+    let mut current_client_request = client_request;
+    let mut attempts = 0;
+    let max_attempts = 2; // Original + 1 retry


this should be configurable

adilhafeez · 2026-02-10T07:47:15Z

crates/brightstaff/src/handlers/llm.rs

-    );
+    // Capture start time right before sending request to upstream
+    let request_start_time = std::time::Instant::now();
+    let _request_start_system_time = std::time::SystemTime::now();


adilhafeez · 2026-02-10T15:28:25Z

I looked through envoy retry semantics https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-field-config-route-v3-routeaction-retry-policy

I think we should lean toward this design for retries. We don't have to implement this completely but we should implement bare minimal but following similar semantics / config, thoughts?

…mo#697) - Added 'retry_on_ratelimit' configuration to LlmProvider. - Implemented a retry loop in the LLM handler to automatically failover to an alternative model when a 429 status is received. - Added comprehensive unit tests for fallback selection and failover logic. - Ensured default behavior is unchanged when the feature is disabled.

…andom fallback selection

raheelshahzad

Exponential backoff with configurable base and max intervals.
Configurable max_retries.
retry_to_same_provider option.
Random alternative selection when failing over to a different model.
Documentation updates in the reference configuration.
Comprehensive unit tests for all the above.

adilhafeez · 2026-02-12T04:59:36Z

Thanks a lot Raheel for continuing to make plano better. We are getting there.

This may be a slightly better way to specify retries,

  model_providers:
    - model: openai/gpt-4o
      access_key: $OPENAI_API_KEY
      default: true
      retry_policy:
        num_retries: 2
        # retry_on: [429]             # default
        # back_off:
        #   base_interval: 25ms       # default
        #   max_interval: 250ms       # default (10x base)
        # failover:
        #   strategy: same_provider   # default

    # Need more control
    - model: anthropic/claude-sonnet-4-0
      access_key: $ANTHROPIC_API_KEY
      retry_policy:
        num_retries: 3
        failover:
          strategy: any

    # Full control
    - model: openai/gpt-4o-mini
      access_key: $OPENAI_API_KEY
      retry_policy:
        num_retries: 2
        retry_on: [429, 503]
        back_off:
          base_interval: 100ms
          max_interval: 2000ms
        failover:
          providers:
            - anthropic/claude-sonnet-4-0

    # No retries (default, just omit retry_policy)
    - model: mistral/ministral-3b-latest
      access_key: $MISTRAL_API_KEY

adilhafeez reviewed Feb 10, 2026

View reviewed changes

adilhafeez mentioned this pull request Feb 10, 2026

feat: add support for retrying LLM requests on 429 ratelimits (#697) #735

Draft

raheelshahzad added 3 commits February 11, 2026 20:08

address PR feedback: exponential backoff, configurable retries, and r…

ed10558

…andom fallback selection

feat: make exponential backoff base and max intervals configurable

ca903d2

raheelshahzad force-pushed the feat/retry-on-ratelimit branch from d1aa3ac to ca903d2 Compare February 12, 2026 04:08

raheelshahzad commented Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add support for retrying LLM requests on 429 ratelimits (#697)#733

feat: add support for retrying LLM requests on 429 ratelimits (#697)#733
raheelshahzad wants to merge 3 commits intokatanemo:mainfrom
raheelshahzad:feat/retry-on-ratelimit

raheelshahzad commented Feb 10, 2026 •

edited by adilhafeez

Loading

Uh oh!

adilhafeez left a comment •

edited

Loading

Uh oh!

adilhafeez Feb 10, 2026

Uh oh!

adilhafeez Feb 10, 2026

Uh oh!

adilhafeez Feb 10, 2026

Uh oh!

adilhafeez Feb 10, 2026

Uh oh!

adilhafeez commented Feb 10, 2026 •

edited

Loading

Uh oh!

raheelshahzad left a comment

Uh oh!

adilhafeez commented Feb 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

raheelshahzad commented Feb 10, 2026 • edited by adilhafeez Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adilhafeez left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adilhafeez Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

adilhafeez Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

adilhafeez Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

adilhafeez Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

adilhafeez commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raheelshahzad left a comment

Choose a reason for hiding this comment

Uh oh!

adilhafeez commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

raheelshahzad commented Feb 10, 2026 •

edited by adilhafeez

Loading

adilhafeez left a comment •

edited

Loading

adilhafeez commented Feb 10, 2026 •

edited

Loading

adilhafeez commented Feb 12, 2026 •

edited

Loading