feat(shared): exponential backoff retry and graceful degradation#167
Open
zong0728 wants to merge 1 commit intokubeflow:mainfrom
Open
feat(shared): exponential backoff retry and graceful degradation#167zong0728 wants to merge 1 commit intokubeflow:mainfrom
zong0728 wants to merge 1 commit intokubeflow:mainfrom
Conversation
Implements GSoC 2026 Agentic RAG spec Requirement kubeflow#5: 'Robust retry logic is a must for all tools. The agent implements exponential backoff with jitter for Vector DB retrievals and LLM API timeouts. If tools strictly fail, the agent is configured to transparently degrade, informing the user that Live code context is currently unreachable.' Changes: - shared/retry.py: reusable @with_retry decorator supporting both sync and async callables; uses AWS full-jitter strategy (random.uniform(0, delay)) to prevent thundering-herd on retry; exposes DEGRADED_RESULT sentinel string for LLM-visible outage messages - server/app.py, server-https/app.py: * milvus_search: remove silent exception swallow; add @with_retry (3 attempts, base 1s, max 10s, factor 2x + jitter); encoder loaded once at module level via _get_encoder() singleton * execute_tool: offload blocking milvus_search to asyncio.to_thread (websocket server) and run_in_threadpool (FastAPI server) so the async event loop stays responsive under concurrent load; on retry exhaustion return DEGRADED_RESULT so LLM communicates the outage to the user instead of silently hallucinating from empty context Signed-off-by: Shengzhong Guan <guan@cmu.edu> Made-with: Cursor Signed-off-by: Shengzhong Guan <guan@cmu.edu>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #160
milvus_search()silently swallows all exceptions and returns{"results": []},preventing retry on transient failures and causing the LLM to hallucinate from
empty context instead of informing the user the service is unavailable.
Root Cause
Both servers wrap the entire Milvus operation in a broad
try/exceptthatdiscards the error and returns a fake success:
Because the exception never propagates, there is no opportunity to retry. execute_tool receives an empty result set and passes it to the LLM as if the search had succeeded.
Failure Scenario
User asks a Kubeflow-specific question
LLM correctly calls search_kubeflow_docs
Milvus pod is restarting (transient unavailability)
milvus_search fails → exception caught → returns {"results": []} ❌
execute_tool sends "No relevant results found." to the LLM
LLM generates a hallucinated answer — user never knows the search failed
Fix
shared/retry.py:reusable@with_retrydecorator (sync + async) using AWS full-jitter exponential backoff (sleep = random.uniform(0, min(cap, base * 2^n))); exposesDEGRADED_RESULTsentinel string for LLM-visible outage messagesmilvus_search: remove silent exception swallow; decorated with@with_retry(max_attempts=3, base_delay=1s, backoff_factor=2x, jitter=True); encoder loaded once at module level via_get_encoder()singletonexecute_tool: offload blocking search toasyncio.to_thread(WebSocket server) andrun_in_threadpool(FastAPI server); returnDEGRADED_RESULTon retry exhaustion so the LLM communicates the outage to the userTesting
Python syntax verified
Simulated Milvus failure triggers 3 retry attempts with increasing delays
DEGRADED_RESULTreturned to LLM after retry exhaustionSingle-turn happy-path behaviour unchanged
Checklist
Commits are signed off (DCO)
Fixes #
Implements GSoC 2026 spec Requirement Https support and updated Readme #5 (exponential backoff + graceful degradation)
No regressions to single-turn query paths