Migrate web search to Tavily and refactor imports#5
Conversation
📝 WalkthroughWalkthroughThis PR introduces Tavily-based web search replacing Google Custom Search, implements QA semantic caching, adds runtime configuration support, refactors imports to absolute paths, creates a preflight credential validation system, and enhances the interactive CLI with dynamic runtime adjustments and status reporting. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can scan for known vulnerabilities in your dependencies using OSV Scanner.OSV Scanner will automatically detect and report security vulnerabilities in your project's dependencies. No additional configuration is required. |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (4)
src/utils/preflight.py (1)
28-37: Prefer immutable credential definitions at class scope.
REQUIRED_CREDENTIALSandOPTIONAL_CREDENTIALSare constants semantically; using tuples avoids accidental runtime mutation.Suggested refactor
- REQUIRED_CREDENTIALS = [ + REQUIRED_CREDENTIALS = ( ("github_token", "GitHub Token", "Required for LLM and embeddings"), ("surrealdb_url", "SurrealDB URL", "Database connection"), ("surrealdb_user", "SurrealDB Username", "Database authentication"), ("surrealdb_pass", "SurrealDB Password", "Database authentication"), - ] + ) - OPTIONAL_CREDENTIALS = [ + OPTIONAL_CREDENTIALS = ( ("tavily_api_key", "Tavily API Key", "Required for web search"), - ] + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/preflight.py` around lines 28 - 37, REQUIRED_CREDENTIALS and OPTIONAL_CREDENTIALS are defined as mutable lists; change them to immutable tuples and move them to the appropriate class as class-level constants (e.g., Preflight.REQUIRED_CREDENTIALS and Preflight.OPTIONAL_CREDENTIALS) so they cannot be mutated at runtime; keep the same tuple entries and names to preserve usage.src/vectorstore/surrealdb_store.py (1)
23-26: Return type mismatch with interface declaration.The interface declares
embedding_functionreturnsEmbeddingInterface, but this implementation returnsGithubEmbeddings. This works at runtime (assumingGithubEmbeddingsis a subclass), but the type annotation should match the abstract declaration for consistency.♻️ Suggested fix
+from src.core.interfaces import EmbeddingInterface + `@property` - def embedding_function(self) -> GithubEmbeddings: + def embedding_function(self) -> EmbeddingInterface: """Get the embedding function used by this vector store""" return self._embedding_function🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/vectorstore/surrealdb_store.py` around lines 23 - 26, The embedding_function property currently types its return as GithubEmbeddings but the interface declares EmbeddingInterface; update the annotation on the embedding_function property to return EmbeddingInterface (and ensure the _embedding_function field is typed accordingly) so the signature matches the abstract/interface declaration (replace GithubEmbeddings -> EmbeddingInterface in the property return type and related attribute type hints).tests/test_parallelism.py (2)
58-64:_has_gather_in_functiondoesn't verify the call isasyncio.gather.Similar to the sleep check, this only verifies
.gatherattribute access. A false positive would occur if any other module's.gather()method exists in the function.♻️ Suggested improvement
def _has_gather_in_function(func_node: ast.AST) -> bool: """Return True if *func_node* contains an asyncio.gather call.""" for node in ast.walk(func_node): if isinstance(node, ast.Call): - if isinstance(node.func, ast.Attribute) and node.func.attr == "gather": - return True + if isinstance(node.func, ast.Attribute) and node.func.attr == "gather": + # Verify it's asyncio.gather + if isinstance(node.func.value, ast.Name) and node.func.value.id == "asyncio": + return True return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_parallelism.py` around lines 58 - 64, The function _has_gather_in_function currently only checks for an Attribute named "gather" and can false-positive on other modules; update it to ensure the call target is specifically asyncio.gather by checking that node is an ast.Call, node.func is an ast.Attribute with attr == "gather", and node.func.value is an ast.Name with id == "asyncio" (or handle a dotted attribute chain where the base Name is "asyncio"), returning True only in that case; use the existing symbols ast.Call, node.func, ast.Attribute, and _has_gather_in_function to locate and change the logic.
42-55:_collect_sleep_callsmay produce false positives for non-asyncio sleep calls.The helper only checks
func.attr == "sleep"without verifying the value is actuallyasyncio. This could matchtime.sleep(...),some_lib.sleep(...), or any other.sleep()attribute call.Consider tightening the check:
♻️ Suggested improvement
def _collect_sleep_calls(tree: ast.AST) -> List[ast.Await]: """Return all AST Await nodes that call asyncio.sleep(...).""" sleeps = [] for node in ast.walk(tree): if not isinstance(node, ast.Await): continue call = node.value if not isinstance(call, ast.Call): continue func = call.func # Accept both `asyncio.sleep(...)` (Attribute) forms - if isinstance(func, ast.Attribute) and func.attr == "sleep": - sleeps.append(node) + if isinstance(func, ast.Attribute) and func.attr == "sleep": + # Verify it's asyncio.sleep specifically + if isinstance(func.value, ast.Name) and func.value.id == "asyncio": + sleeps.append(node) return sleeps🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_parallelism.py` around lines 42 - 55, The helper _collect_sleep_calls is currently matching any attribute named "sleep" (e.g., time.sleep) and should be tightened to only match asyncio.sleep; update the conditional inside _collect_sleep_calls so it requires isinstance(func, ast.Attribute) and func.attr == "sleep" and that func.value is an ast.Name with id == "asyncio" (i.e., check isinstance(func.value, ast.Name) and func.value.id == "asyncio") so only asyncio.sleep(Attribute) calls are collected; if you need to also support "from asyncio import sleep" cases, handle ast.Name(func) separately but only after resolving imports/aliases to confirm it comes from asyncio.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@check_tavily.py`:
- Around line 50-56: The check_settings() block currently accepts any non-empty
settings.tavily_api_key (causing false positives for placeholder values like
"your_tavily_api_key_here"); update check_settings() to trim whitespace and
reject known placeholder patterns (e.g., exact matches like
"your_tavily_api_key_here" or values that start with "your_" or contain
"example" or "placeholder") before returning True; when rejecting, print the
existing error message (or a slightly more specific one) and return False so
only real API keys pass.
In `@src/config/settings.py`:
- Around line 83-84: The qa_cache_similarity_threshold Field currently accepts
any float and should be constrained to [0.0, 1.0]; update the Field definition
for qa_cache_similarity_threshold in settings.py (the
qa_cache_similarity_threshold symbol) to enforce bounds (e.g., use
Field(default=0.8, ge=0.0, le=1.0) or an equivalent pydantic constrained type)
so values outside 0–1 are rejected/validated.
In `@src/llm/github_llm.py`:
- Around line 130-131: The current synchronous client.complete calls block the
event loop; switch to the async client by importing
azure.ai.inference.aio.ChatCompletionsClient, instantiate it in the class
__init__ (assign to self.client), and replace synchronous calls to
self.client.complete(...) inside async methods with awaited calls (await
self.client.complete(...)); ensure the client is closed/async-disposed where
appropriate and update any initialization/cleanup logic used by the class that
creates the client.
- Around line 83-89: The method _get_max_tokens_param currently drops token
limits when self._use_completion_tokens is True; instead, return the Azure
pass-through form so limits are honored: when self._use_completion_tokens is
True return a dict that supplies model_extras with the key
"max_completion_tokens" set to max_tokens_value (e.g. {"model_extras":
{"max_completion_tokens": max_tokens_value}}), and update the inline comment to
remove the incorrect claim; keep the existing {"max_tokens": max_tokens_value}
return for the non-completion branch and ensure any callers that set headers
include the required "extra-parameters: pass-through" behavior elsewhere in the
codebase.
In `@src/rag/reflexion_engine.py`:
- Around line 595-626: The QA cache storage path can call store_qa_cache with a
None embedding if embed_text fails twice; update the block guarded by
settings.qa_cache_enabled / final_answer to ensure a valid question_embedding
before calling self.vector_store.store_qa_cache (e.g., attempt embedding via
self.vector_store.embedding_function.embed_text and if it returns None or
raises, skip storing and log a warning), or wrap the entire store_qa_cache call
in a try/except that validates question_embedding is not None before invoking
store_qa_cache (refer to symbols question_embedding,
self.vector_store.embedding_function.embed_text,
self.vector_store.store_qa_cache, reflexion_memory, and logger) and include the
actual exception details in logger.error when storage or embedding fails.
In `@src/utils/input_handler.py`:
- Around line 112-115: The multiline cancel path in _get_multiline_input
currently re-enters get_input(prompt) recursively, which can cause unbounded
recursion; change the cancel behavior so _get_multiline_input returns a sentinel
(e.g., None or empty string) instead of calling self.get_input, and update
get_input to handle that sentinel by iterating/re-prompting in a loop (or
repeating the input flow) rather than calling itself recursively. Specifically
modify _get_multiline_input (the branch that checks for "\\e") to return a value
indicating "cancelled" and adjust get_input to detect that return, continue its
prompt loop, and present the single-line prompt again without recursion.
In `@src/utils/preflight.py`:
- Around line 47-63: The presence check treats whitespace-only credential
strings as valid; update the logic in the preflight check (where value =
getattr(settings, attr_name, None) and the subsequent is_present/source
determination) to trim string values first (e.g., apply .strip() to value when
it's a str) and then perform the github_token prefix checks and the "your_"
placeholder check against the trimmed value so that strings composed only of
whitespace are considered missing.
In `@src/vectorstore/surrealdb_store.py`:
- Around line 478-487: The SQL built into the query variable currently
interpolates Python values directly (query_embedding and threshold) which risks
injection; change this to a parameterized SurrealDB query using placeholders
(e.g. $query_embedding and $threshold) instead of f-string interpolation and
pass a validated params dict when executing, or—if SurrealDB parameterization
isn't available—perform strict defensive validation/conversion: ensure
query_embedding is explicitly converted to a List[float] and threshold to a
float before embedding, and only then format a safe, canonical representation;
update the call that executes this query to use the new params and reference the
qa_history table and vector::similarity::cosine usage accordingly.
In `@src/websearch/tavily_search.py`:
- Around line 95-101: The current logic can mark empty or None full_content as
SUCCESS; update the branching around word_count/full_content so empty content is
explicitly handled: if full_content is falsy or word_count == 0 set status to
WebSearchStatus.NO_CONTENT (or WebSearchStatus.LOW_QUALITY if NO_CONTENT doesn't
exist), else compute min_length = settings.web_search_min_content_length and if
len(full_content) < min_length set status = WebSearchStatus.LOW_QUALITY,
otherwise set status = WebSearchStatus.SUCCESS; adjust the block around
variables full_content, word_count, min_length, and the WebSearchStatus
assignments accordingly.
---
Nitpick comments:
In `@src/utils/preflight.py`:
- Around line 28-37: REQUIRED_CREDENTIALS and OPTIONAL_CREDENTIALS are defined
as mutable lists; change them to immutable tuples and move them to the
appropriate class as class-level constants (e.g., Preflight.REQUIRED_CREDENTIALS
and Preflight.OPTIONAL_CREDENTIALS) so they cannot be mutated at runtime; keep
the same tuple entries and names to preserve usage.
In `@src/vectorstore/surrealdb_store.py`:
- Around line 23-26: The embedding_function property currently types its return
as GithubEmbeddings but the interface declares EmbeddingInterface; update the
annotation on the embedding_function property to return EmbeddingInterface (and
ensure the _embedding_function field is typed accordingly) so the signature
matches the abstract/interface declaration (replace GithubEmbeddings ->
EmbeddingInterface in the property return type and related attribute type
hints).
In `@tests/test_parallelism.py`:
- Around line 58-64: The function _has_gather_in_function currently only checks
for an Attribute named "gather" and can false-positive on other modules; update
it to ensure the call target is specifically asyncio.gather by checking that
node is an ast.Call, node.func is an ast.Attribute with attr == "gather", and
node.func.value is an ast.Name with id == "asyncio" (or handle a dotted
attribute chain where the base Name is "asyncio"), returning True only in that
case; use the existing symbols ast.Call, node.func, ast.Attribute, and
_has_gather_in_function to locate and change the logic.
- Around line 42-55: The helper _collect_sleep_calls is currently matching any
attribute named "sleep" (e.g., time.sleep) and should be tightened to only match
asyncio.sleep; update the conditional inside _collect_sleep_calls so it requires
isinstance(func, ast.Attribute) and func.attr == "sleep" and that func.value is
an ast.Name with id == "asyncio" (i.e., check isinstance(func.value, ast.Name)
and func.value.id == "asyncio") so only asyncio.sleep(Attribute) calls are
collected; if you need to also support "from asyncio import sleep" cases, handle
ast.Name(func) separately but only after resolving imports/aliases to confirm it
comes from asyncio.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 880b9a68-fb05-40ea-84f0-4439d5e95e69
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
.env.examplecheck_tavily.pypyproject.tomlrag.pyschema/qa_cache_lookup.surqlsrc/config/settings.pysrc/core/interfaces.pysrc/embeddings/github_embeddings.pysrc/llm/github_llm.pysrc/rag/engine.pysrc/rag/reflexion_engine.pysrc/utils/input_handler.pysrc/utils/preflight.pysrc/vectorstore/surrealdb_store.pysrc/websearch/tavily_search.pytests/test_parallelism.py
🚧 Files skipped from review as they are similar to previous changes (1)
- schema/qa_cache_lookup.surql
| key = settings.tavily_api_key | ||
| if key: | ||
| print(" OK : settings.tavily_api_key loaded") | ||
| return True | ||
| else: | ||
| print(" ERROR: settings.tavily_api_key is empty.") | ||
| return False |
There was a problem hiding this comment.
Apply placeholder filtering in settings check as well.
On Line 51, check_settings() treats any non-empty value as valid, including your_tavily_api_key_here, so this step can report a false positive.
Suggested fix
- if key:
+ if key and key != "your_tavily_api_key_here":
print(" OK : settings.tavily_api_key loaded")
return True
else:
print(" ERROR: settings.tavily_api_key is empty.")
return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@check_tavily.py` around lines 50 - 56, The check_settings() block currently
accepts any non-empty settings.tavily_api_key (causing false positives for
placeholder values like "your_tavily_api_key_here"); update check_settings() to
trim whitespace and reject known placeholder patterns (e.g., exact matches like
"your_tavily_api_key_here" or values that start with "your_" or contain
"example" or "placeholder") before returning True; when rejecting, print the
existing error message (or a slightly more specific one) and return False so
only real API keys pass.
| qa_cache_enabled: bool = Field(default=True) | ||
| qa_cache_similarity_threshold: float = Field(default=0.8) |
There was a problem hiding this comment.
Constrain QA cache similarity threshold range.
On Line 84, qa_cache_similarity_threshold accepts any float. Values outside [0.0, 1.0] can silently break cache hit behavior.
Suggested fix
- qa_cache_similarity_threshold: float = Field(default=0.8)
+ qa_cache_similarity_threshold: float = Field(default=0.8, ge=0.0, le=1.0)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| qa_cache_enabled: bool = Field(default=True) | |
| qa_cache_similarity_threshold: float = Field(default=0.8) | |
| qa_cache_enabled: bool = Field(default=True) | |
| qa_cache_similarity_threshold: float = Field(default=0.8, ge=0.0, le=1.0) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/config/settings.py` around lines 83 - 84, The
qa_cache_similarity_threshold Field currently accepts any float and should be
constrained to [0.0, 1.0]; update the Field definition for
qa_cache_similarity_threshold in settings.py (the qa_cache_similarity_threshold
symbol) to enforce bounds (e.g., use Field(default=0.8, ge=0.0, le=1.0) or an
equivalent pydantic constrained type) so values outside 0–1 are
rejected/validated.
| def _get_max_tokens_param(self, max_tokens_value: int) -> dict: | ||
| """Get the appropriate max tokens parameter for the model""" | ||
| if self._use_completion_tokens: | ||
| # Azure AI Inference SDK doesn't support max_completion_tokens | ||
| # For newer models, we omit the parameter and let the model use its default | ||
| return {} | ||
| return {"max_tokens": max_tokens_value} |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For the azure-ai-inference Python SDK version used here, what is the supported way to cap output tokens for ChatCompletionsClient when a model requires completion-token style limits (e.g., max_completion_tokens)?
💡 Result:
In the azure-ai-inference Python SDK (e.g., ChatCompletionsClient in 1.0.0b9), the supported way is to pass model-specific fields via model_extras, and instruct the service to forward unknown fields by setting the extra-parameters: pass-through header. The SDK documents that model_extras is injected “as-is” at the root of the JSON body and that handling depends on the extra-parameters request header. (azuresdkdocs.z19.web.core.windows.net) The REST API docs describe extra-parameters: pass-through for forwarding such parameters. (learn.microsoft.com)
resp = client.complete(
messages=[...],
# don't use max_tokens for these models
model_extras={"max_completion_tokens": 512},
headers={"extra-parameters": "pass-through"},
)Citations:
- 1: https://azuresdkdocs.z19.web.core.windows.net/python/azure-ai-inference/1.0.0b9/azure.ai.inference.html
- 2: https://learn.microsoft.com/sv-se/rest/api/aifoundry/modelinference/
🏁 Script executed:
# Find the SDK version being used
find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "poetry.lock" | head -20Repository: CLoaKY233/Multi-cycle-RAG
Length of output: 85
🏁 Script executed:
# Search for azure-ai-inference version specification
rg "azure-ai-inference" --type py --type txtRepository: CLoaKY233/Multi-cycle-RAG
Length of output: 51
🏁 Script executed:
# Find all usages of _get_max_tokens_param
rg "_get_max_tokens_param" -B 2 -A 5Repository: CLoaKY233/Multi-cycle-RAG
Length of output: 1132
🏁 Script executed:
cat pyproject.tomlRepository: CLoaKY233/Multi-cycle-RAG
Length of output: 778
🏁 Script executed:
# Get broader context around the params.update call
rg "params.update.*_get_max_tokens_param" -B 20 -A 10 src/llm/github_llm.pyRepository: CLoaKY233/Multi-cycle-RAG
Length of output: 1197
🏁 Script executed:
# Find where params are used (likely in a complete() or similar call)
rg "def.*complete|def.*chat" -A 30 src/llm/github_llm.py | head -80Repository: CLoaKY233/Multi-cycle-RAG
Length of output: 1319
🏁 Script executed:
# Check if complete() call shows any header parameter usage in the file
rg "complete\(" -A 5 src/llm/github_llm.py | head -30Repository: CLoaKY233/Multi-cycle-RAG
Length of output: 563
🏁 Script executed:
# Search for any header-related code in the file
rg "header|Header" src/llm/github_llm.pyRepository: CLoaKY233/Multi-cycle-RAG
Length of output: 51
Update the method to support max_completion_tokens for models that require it, or document why limits are being dropped.
The azure-ai-inference SDK (1.0.0b9+) does support max_completion_tokens—contrary to the inline comment—by passing model_extras={"max_completion_tokens": value} with the extra-parameters: pass-through header. Silently returning {} on line 88 discards the caller's token limit entirely, risking unexpectedly long and costly completions. Either implement the proper limit using the supported mechanism, or explicitly document and justify why newer models use unbounded output.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/llm/github_llm.py` around lines 83 - 89, The method _get_max_tokens_param
currently drops token limits when self._use_completion_tokens is True; instead,
return the Azure pass-through form so limits are honored: when
self._use_completion_tokens is True return a dict that supplies model_extras
with the key "max_completion_tokens" set to max_tokens_value (e.g.
{"model_extras": {"max_completion_tokens": max_tokens_value}}), and update the
inline comment to remove the incorrect claim; keep the existing {"max_tokens":
max_tokens_value} return for the non-completion branch and ensure any callers
that set headers include the required "extra-parameters: pass-through" behavior
elsewhere in the codebase.
| response = self.client.complete(**params) # type: ignore[attr-defined] | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
import inspect
try:
from azure.ai.inference import ChatCompletionsClient
print("sync complete signature:", inspect.signature(ChatCompletionsClient.complete))
except Exception as e:
print("sync client import error:", e)
try:
from azure.ai.inference.aio import ChatCompletionsClient as AioChatCompletionsClient
print("aio client available: yes")
print("aio complete signature:", inspect.signature(AioChatCompletionsClient.complete))
print("aio complete is coroutine:", inspect.iscoroutinefunction(AioChatCompletionsClient.complete))
except Exception as e:
print("aio client available: no")
print("aio import error:", e)
PYRepository: CLoaKY233/Multi-cycle-RAG
Length of output: 185
🏁 Script executed:
# Check dependencies
find . -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" | head -5Repository: CLoaKY233/Multi-cycle-RAG
Length of output: 85
🏁 Script executed:
# Examine the file in question
cat -n src/llm/github_llm.py | sed -n '120,210p'Repository: CLoaKY233/Multi-cycle-RAG
Length of output: 4551
🏁 Script executed:
# Check for any async client usage patterns elsewhere
rg "aio|async.*client|ChatCompletionsClient" --type pyRepository: CLoaKY233/Multi-cycle-RAG
Length of output: 214
🏁 Script executed:
cat pyproject.toml | grep -A 20 -B 5 "azure"Repository: CLoaKY233/Multi-cycle-RAG
Length of output: 640
🏁 Script executed:
# Also check if there's any async usage pattern elsewhere in the codebase
rg "async def|await" src/llm/ --type py -B 2 -A 2Repository: CLoaKY233/Multi-cycle-RAG
Length of output: 633
🌐 Web query:
azure-ai-inference 1.0.0 async ChatCompletionsClient aio documentation
💡 Result:
For async chat completions in azure-ai-inference, use the azure.ai.inference.aio.ChatCompletionsClient (the async client lives under the aio namespace). The SDK docs are versioned; the current published docs I can find are for azure-ai-inference 1.0.0b9 (beta), not a GA 1.0.0. [1]
Async client API docs (aio):
azure.ai.inference.aio.ChatCompletionsClient(constructor,async complete(...),async close(), etc.) [2]
PyPI “async” notes (install + import path):
- Install
aiohttp, then import fromazure.ai.inference.aioinstead ofazure.ai.inference. [1]
Minimal async usage (from the documented pattern):
import asyncio
from azure.ai.inference.aio import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential
async def main():
client = ChatCompletionsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
try:
resp = await client.complete(messages=[
SystemMessage("You are a helpful assistant."),
UserMessage("How many feet are in a mile?")
])
print(resp.choices[0].message.content)
finally:
await client.close()
asyncio.run(main())[1][2]
Sources
azure-ai-inferencePyPI README (async client +aiohttp+azure.ai.inference.aioimport). [1]- Azure SDK for Python API reference:
azure.ai.inference.aio.ChatCompletionsClient(asynccomplete, etc.). [2]
🏁 Script executed:
# Check if there are any other LLM clients in the codebase that use async patterns correctly
rg "async for|await.*complete|aio" --type py src/Repository: CLoaKY233/Multi-cycle-RAG
Length of output: 1221
Replace sync client with async client to avoid blocking the event loop.
Lines 130 and 202 execute synchronous client.complete() calls inside async def methods. The azure-ai-inference SDK provides an async variant at azure.ai.inference.aio.ChatCompletionsClient with an async complete() method that must be awaited. The current sync call will block the event loop under concurrent load, degrading performance when multiple streams are active (confirmed usage pattern in src/reflexion/evaluator.py and src/rag/reflexion_engine.py).
Replace with:
from azure.ai.inference.aio import ChatCompletionsClient
# In __init__: self.client = ChatCompletionsClient(...)
# At lines 130, 202: response = await self.client.complete(**params)Also applies to: 202-203
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/llm/github_llm.py` around lines 130 - 131, The current synchronous
client.complete calls block the event loop; switch to the async client by
importing azure.ai.inference.aio.ChatCompletionsClient, instantiate it in the
class __init__ (assign to self.client), and replace synchronous calls to
self.client.complete(...) inside async methods with awaited calls (await
self.client.complete(...)); ensure the client is closed/async-disposed where
appropriate and update any initialization/cleanup logic used by the class that
creates the client.
| # Store in QA semantic cache for future similar questions | ||
| if settings.qa_cache_enabled and final_answer: | ||
| try: | ||
| # Reuse the embedding computed before the reflexion loop | ||
| if question_embedding is None: | ||
| question_embedding = ( | ||
| await self.vector_store.embedding_function.embed_text( | ||
| question | ||
| ) | ||
| ) | ||
|
|
||
| await self.vector_store.store_qa_cache( | ||
| question=question, | ||
| question_embedding=question_embedding, | ||
| answer=final_answer, | ||
| metadata={ | ||
| "total_cycles": len(reflexion_memory.cycles), | ||
| "total_processing_time": reflexion_memory.total_processing_time, | ||
| "final_confidence": reflexion_memory.cycles[ | ||
| -1 | ||
| ].evaluation.confidence_score | ||
| if reflexion_memory.cycles | ||
| else 0.0, | ||
| "web_search_used": any( | ||
| cycle.web_search_enabled | ||
| for cycle in reflexion_memory.cycles | ||
| ), | ||
| }, | ||
| ) | ||
| logger.info("Stored QA cache entry for future similar questions") | ||
| except Exception as e: | ||
| logger.error("Error storing QA cache", error=str(e)) |
There was a problem hiding this comment.
Potential issue: question_embedding could be None when storing QA cache.
If settings.qa_cache_enabled is True but the initial embedding fails (line 152 sets question_embedding = None), the code at line 599 regenerates the embedding. However, if this second embedding call also fails, store_qa_cache would be called with a potentially invalid embedding.
Consider wrapping the entire QA cache storage in a try-except or checking the embedding result:
🛡️ Suggested improvement
if settings.qa_cache_enabled and final_answer:
try:
# Reuse the embedding computed before the reflexion loop
if question_embedding is None:
question_embedding = (
await self.vector_store.embedding_function.embed_text(
question
)
)
+
+ if not question_embedding:
+ logger.warning("Failed to generate question embedding for QA cache")
+ raise ValueError("Empty embedding")
await self.vector_store.store_qa_cache(🧰 Tools
🪛 Ruff (0.15.5)
[warning] 625-625: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/rag/reflexion_engine.py` around lines 595 - 626, The QA cache storage
path can call store_qa_cache with a None embedding if embed_text fails twice;
update the block guarded by settings.qa_cache_enabled / final_answer to ensure a
valid question_embedding before calling self.vector_store.store_qa_cache (e.g.,
attempt embedding via self.vector_store.embedding_function.embed_text and if it
returns None or raises, skip storing and log a warning), or wrap the entire
store_qa_cache call in a try/except that validates question_embedding is not
None before invoking store_qa_cache (refer to symbols question_embedding,
self.vector_store.embedding_function.embed_text,
self.vector_store.store_qa_cache, reflexion_memory, and logger) and include the
actual exception details in logger.error when storage or embedding fails.
| if line.strip().lower() == "\\e": | ||
| console.print("[dim]Multi-line input cancelled.[/dim]") | ||
| # Fall back to normal single-line input | ||
| return self.get_input(prompt) |
There was a problem hiding this comment.
Avoid recursive re-entry from multiline cancel paths.
On Line 115 and Line 128, calling self.get_input(prompt) from inside _get_multiline_input() can build unbounded recursion during repeated cancel cycles.
Suggested fix
- if raw_input.strip().lower() == "\\m":
- # Enter multiline mode
- return self._get_multiline_input(prompt)
+ if raw_input.strip().lower() == "\\m":
+ # Enter multiline mode
+ multiline_result = self._get_multiline_input(prompt)
+ if multiline_result is None:
+ continue
+ return multiline_result- # Fall back to normal single-line input
- return self.get_input(prompt)
+ # Cancel multiline and return control to caller loop
+ return None
...
- return self.get_input(prompt)
+ return NoneAlso applies to: 126-128
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/input_handler.py` around lines 112 - 115, The multiline cancel path
in _get_multiline_input currently re-enters get_input(prompt) recursively, which
can cause unbounded recursion; change the cancel behavior so
_get_multiline_input returns a sentinel (e.g., None or empty string) instead of
calling self.get_input, and update get_input to handle that sentinel by
iterating/re-prompting in a loop (or repeating the input flow) rather than
calling itself recursively. Specifically modify _get_multiline_input (the branch
that checks for "\\e") to return a value indicating "cancelled" and adjust
get_input to detect that return, continue its prompt loop, and present the
single-line prompt again without recursion.
| value = getattr(settings, attr_name, None) | ||
|
|
||
| # Check if value is present and not a placeholder | ||
| is_present = False | ||
| source = "missing" | ||
|
|
||
| if value: | ||
| # For GitHub token, ghp_ prefix is valid | ||
| if attr_name == "github_token" and value.startswith( | ||
| ("ghp_", "github_pat_") | ||
| ): | ||
| is_present = True | ||
| source = "env" | ||
| elif not value.lower().startswith("your_"): | ||
| is_present = True | ||
| source = "env" | ||
|
|
There was a problem hiding this comment.
Trim credential values before presence checks.
On Line 53, whitespace-only strings (e.g. " ") are treated as present, which can produce false “ready” status for required credentials.
Suggested fix
- value = getattr(settings, attr_name, None)
+ raw_value = getattr(settings, attr_name, None)
+ value = raw_value.strip() if isinstance(raw_value, str) else raw_value
...
- if value:
+ if isinstance(value, str) and value:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/preflight.py` around lines 47 - 63, The presence check treats
whitespace-only credential strings as valid; update the logic in the preflight
check (where value = getattr(settings, attr_name, None) and the subsequent
is_present/source determination) to trim string values first (e.g., apply
.strip() to value when it's a str) and then perform the github_token prefix
checks and the "your_" placeholder check against the trimmed value so that
strings composed only of whitespace are considered missing.
| try: | ||
| # Search for similar questions using cosine similarity | ||
| query = f""" | ||
| SELECT id, question, answer, metadata, | ||
| vector::similarity::cosine(question_embedding, {query_embedding}) AS score | ||
| FROM qa_history | ||
| WHERE vector::similarity::cosine(question_embedding, {query_embedding}) >= {threshold} | ||
| ORDER BY score DESC | ||
| LIMIT 1; | ||
| """ |
There was a problem hiding this comment.
SQL injection vector through string interpolation.
The static analysis flagged S608 here. While query_embedding is a List[float] and threshold is a float (mitigating practical injection risk), directly interpolating Python values into query strings is a code smell and could become problematic if types aren't validated upstream.
Consider using SurrealDB's parameterized query syntax if supported, or add explicit type validation:
🛡️ Defensive validation approach
async def lookup_qa_cache(
self,
query_embedding: List[float],
threshold: float = 0.85,
) -> Dict[str, Any] | None:
"""Lookup similar question in cache by embedding similarity"""
await self._ensure_connection()
if not self.client:
raise VectorStoreException("Client not connected")
+ # Validate types to prevent injection
+ if not isinstance(threshold, (int, float)) or not (0.0 <= threshold <= 1.0):
+ raise ValueError("threshold must be a float between 0.0 and 1.0")
+ if not isinstance(query_embedding, list) or not all(isinstance(x, (int, float)) for x in query_embedding):
+ raise ValueError("query_embedding must be a list of floats")
+
try:
# Search for similar questions using cosine similarity
query = f"""🧰 Tools
🪛 Ruff (0.15.5)
[error] 480-487: Possible SQL injection vector through string-based query construction
(S608)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/vectorstore/surrealdb_store.py` around lines 478 - 487, The SQL built
into the query variable currently interpolates Python values directly
(query_embedding and threshold) which risks injection; change this to a
parameterized SurrealDB query using placeholders (e.g. $query_embedding and
$threshold) instead of f-string interpolation and pass a validated params dict
when executing, or—if SurrealDB parameterization isn't available—perform strict
defensive validation/conversion: ensure query_embedding is explicitly converted
to a List[float] and threshold to a float before embedding, and only then format
a safe, canonical representation; update the call that executes this query to
use the new params and reference the qa_history table and
vector::similarity::cosine usage accordingly.
| word_count = len(full_content.split()) if full_content else 0 | ||
|
|
||
| min_length = settings.web_search_min_content_length | ||
| if word_count > 0 and len(full_content) < min_length: | ||
| status = WebSearchStatus.LOW_QUALITY | ||
| else: | ||
| status = WebSearchStatus.SUCCESS |
There was a problem hiding this comment.
Edge case: word_count is zero when full_content is falsy, but status defaults to SUCCESS.
When full_content is empty/None, word_count will be 0, but the condition word_count > 0 and len(full_content) < min_length is False, so status defaults to SUCCESS. An empty result marked as SUCCESS may be misleading.
♻️ Suggested fix
word_count = len(full_content.split()) if full_content else 0
min_length = settings.web_search_min_content_length
- if word_count > 0 and len(full_content) < min_length:
+ if not full_content or word_count == 0:
+ status = WebSearchStatus.LOW_QUALITY
+ elif len(full_content) < min_length:
status = WebSearchStatus.LOW_QUALITY
else:
status = WebSearchStatus.SUCCESS🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/websearch/tavily_search.py` around lines 95 - 101, The current logic can
mark empty or None full_content as SUCCESS; update the branching around
word_count/full_content so empty content is explicitly handled: if full_content
is falsy or word_count == 0 set status to WebSearchStatus.NO_CONTENT (or
WebSearchStatus.LOW_QUALITY if NO_CONTENT doesn't exist), else compute
min_length = settings.web_search_min_content_length and if len(full_content) <
min_length set status = WebSearchStatus.LOW_QUALITY, otherwise set status =
WebSearchStatus.SUCCESS; adjust the block around variables full_content,
word_count, min_length, and the WebSearchStatus assignments accordingly.
This pull request introduces several improvements and new features across configuration, database schema, developer tooling, and prompt templates. The key updates include switching web search integration from Google to Tavily, adding semantic QA cache support, enhancing developer and validation workflows, and refining prompt generation guidelines.
Web Search Integration and Configuration:
.env.exampleandsrc/config/settings.pyto useTAVILY_API_KEYand related Tavily-specific settings. Added a diagnostic scriptcheck_tavily.pyto verify Tavily integration. [1] [2] [3]Semantic QA Cache Support:
qa_historytable, a similarity search functionfn::qa_cache_lookup, and related config options to enable and set threshold for cache lookups. [1] [2] [3]Database Schema Improvements:
IF NOT EXISTStodocumentsandweb_searchtables, and removed unused HNSW index definitions for cleaner SurrealDB schema migrations. [1] [2]Developer Tooling and Validation:
.factory/services.yamlwith install/test/lint commands,.factory/init.shfor environment sync, user testing documentation, and performance/parallelism validation results and review. [1] [2] [3] [4] [5]Prompt Template and Dependency Updates:
tavily-pythonand optional developer dependencies (pytest,pytest-asyncio), and improved Pyright and pytest configuration.Other Minor Improvements:
src/__init__.pyto use absolute imports.embedding_functionin the vector store interface for better extensibility.Let me know if you want a deeper dive into any of these areas!
Summary by CodeRabbit
New Features
Chores