Skip to content

Migrate web search to Tavily and refactor imports#5

Open
CLoaKY233 wants to merge 4 commits into
mainfrom
migrate_to_tavily
Open

Migrate web search to Tavily and refactor imports#5
CLoaKY233 wants to merge 4 commits into
mainfrom
migrate_to_tavily

Conversation

@CLoaKY233

@CLoaKY233 CLoaKY233 commented Mar 15, 2026

Copy link
Copy Markdown
Owner

This pull request introduces several improvements and new features across configuration, database schema, developer tooling, and prompt templates. The key updates include switching web search integration from Google to Tavily, adding semantic QA cache support, enhancing developer and validation workflows, and refining prompt generation guidelines.

Web Search Integration and Configuration:

  • Switched web search provider from Google Custom Search to Tavily, updating .env.example and src/config/settings.py to use TAVILY_API_KEY and related Tavily-specific settings. Added a diagnostic script check_tavily.py to verify Tavily integration. [1] [2] [3]

Semantic QA Cache Support:

  • Added schema and SurrealDB function for a semantic QA cache: new qa_history table, a similarity search function fn::qa_cache_lookup, and related config options to enable and set threshold for cache lookups. [1] [2] [3]

Database Schema Improvements:

  • Made table creation idempotent by adding IF NOT EXISTS to documents and web_search tables, and removed unused HNSW index definitions for cleaner SurrealDB schema migrations. [1] [2]

Developer Tooling and Validation:

  • Added developer and validation infrastructure: .factory/services.yaml with install/test/lint commands, .factory/init.sh for environment sync, user testing documentation, and performance/parallelism validation results and review. [1] [2] [3] [4] [5]
  • Added a detailed skill description for the backend worker, outlining procedures for database, Python, and CLI enhancements.

Prompt Template and Dependency Updates:

  • Refined prompt templates to enforce complete, self-contained responses, and clarified formatting for the "Sources" section. [1] [2] [3] [4] [5]
  • Updated dependencies to include tavily-python and optional developer dependencies (pytest, pytest-asyncio), and improved Pyright and pytest configuration.

Other Minor Improvements:

  • Fixed import paths in src/__init__.py to use absolute imports.
  • Added an abstract property for embedding_function in the vector store interface for better extensibility.

Let me know if you want a deeper dive into any of these areas!

Summary by CodeRabbit

  • New Features

    • Web search now uses Tavily API for improved search results
    • Added QA semantic caching to reuse similar answers and reduce processing time
    • Enhanced CLI with interactive runtime configuration menu for web search, reflexion cycles, confidence thresholds, and cache controls
    • Added pre-flight credential validation checks for required services
  • Chores

    • Updated dependencies to support Tavily integration and testing framework

@CLoaKY233 CLoaKY233 self-assigned this Mar 15, 2026
Copilot AI review requested due to automatic review settings March 15, 2026 10:01
@coderabbitai

coderabbitai Bot commented Mar 15, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR introduces Tavily-based web search replacing Google Custom Search, implements QA semantic caching, adds runtime configuration support, refactors imports to absolute paths, creates a preflight credential validation system, and enhances the interactive CLI with dynamic runtime adjustments and status reporting.

Changes

Cohort / File(s) Summary
Web Search Migration
.env.example, src/websearch/__init__.py, src/websearch/google_search.py, src/websearch/tavily_search.py, check_tavily.py
Replaces Google Custom Search API with Tavily API integration. Removes google_api_key/google_cse_id, adds TAVILY_API_KEY with new web search controls (content extraction, title length, content length, timeout). Adds diagnostic script for Tavily configuration validation. GoogleWebSearch class deleted; TavilyWebSearch added with native API integration and lazy client initialization.
Configuration & Settings
src/config/settings.py, pyproject.toml
Replaces Google credentials (google_api_key, google_cse_id) with tavily_api_key. Adds QA cache settings (qa_cache_enabled, qa_cache_similarity_threshold). Adds tavily-python>=0.7.23 dependency and pytest/pytest-asyncio dev dependencies. Adds pytest configuration with asyncio_mode and testpaths.
Database Schema
schema/documents.surql, schema/web_search.surql, schema/qa_history.surql, schema/qa_cache_lookup.surql
Converts table definitions to IF NOT EXISTS form. Removes HNSW vector indexes from documents and web_search tables. Adds new qa_history table with question embeddings, answers, and metadata for semantic caching. Adds qa_cache_lookup SurrealQL function for similarity-based lookup.
Core Engine & RAG
src/rag/engine.py, src/rag/reflexion_engine.py, src/core/interfaces.py
Adds public wrapper methods to RAGEngine for runtime configuration (web_search_mode, max_cycles, confidence_threshold, qa_cache control). Adds QA cache and web search deletion methods. ReflexionRAGEngine replaces GoogleWebSearch with TavilyWebSearch, implements QA cache lookup during query processing, parallelizes DB/web retrieval via asyncio.gather, stores QA pairs with embeddings and metadata. Enhances VectorStoreInterface with embedding_function property, store_qa_cache, lookup_qa_cache, delete_all_web_searches, clear_qa_cache methods.
Vector Store Implementation
src/vectorstore/surrealdb_store.py
Implements QA cache methods (store_qa_cache, lookup_qa_cache, clear_qa_cache). Exposes embedding_function property. Removes HNSW index initialization from schema setup. Parallelizes similarity_search_combined using asyncio.gather. Adds QA history table initialization with question embeddings and metadata fields.
Import Path Refactoring
src/__init__.py, src/data/loader.py, src/data/processor.py, src/embeddings/github_embeddings.py, src/llm/github_llm.py, src/memory/cache.py, src/reflexion/evaluator.py
Converts relative imports to absolute src.* imports across data, embeddings, LLM, memory, and reflexion modules. Updates public API in src/init.py to export Document and settings alongside RAGEngine.
LLM Model Capabilities
src/llm/github_llm.py
Adds model capability detection methods (_requires_completion_tokens, _model_supports_temperature, _is_temperature_error, _get_max_tokens_param, _build_params). Implements automatic retry logic in generate_stream and chat_stream to handle models with unsupported temperature parameter. Maintains streaming behavior with usage tracking.
Embedding Async Support
src/embeddings/github_embeddings.py
Wraps synchronous embedding client calls with asyncio.to_thread for asynchronous execution in embed_text and embed_documents methods.
Interactive CLI Enhancements
rag.py, src/utils/input_handler.py, src/utils/preflight.py
Introduces RuntimeConfig class to manage runtime settings. Enhances InteractiveRAGChat with status bar rendering, dynamic runtime configuration menu, and input handler integration. Adds CLI options for docs path, ingestion, web search control, reflexion cycles, threshold, cache management, and model override. Implements preflight credential validation (PreflightChecker) for required (GitHub, SurrealDB) and optional (Tavily) credentials. Adds check CLI command for pre-flight validation. Implements MultiLineInput and EnhancedInputHandler for improved interactive input with history and multiline support.
Prompt Templates
prompts/generation/initial_generation.yaml, prompts/generation/reflexion_generation.yaml, prompts/generation/simple_generation.yaml
Adds explicit guidelines requiring complete, self-contained responses that never end mid-sentence or mid-list. Adds guideline that Sources section must be final and each source entry must end with a period. Ensures consistent formatting constraints across all generation prompt variants.
Testing
tests/test_parallelism.py
Adds comprehensive test suite validating asyncio concurrency patterns, absence of sleep calls in main loops, use of asyncio.gather in parallel operations, and timing comparisons of concurrent vs. sequential execution. Includes AST-based analysis tests to verify parallelization patterns.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A web search hops from Google to Tavily's door,
Caching questions with embeddings galore!
Runtime configs dance in menus bright,
Preflight checks ensure credentials are right.
Asyncio gathers in parallel flight—
bounce bounce ✨ The RAG engine takes flight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Migrate web search to Tavily and refactor imports' accurately and concisely summarizes the two main components of this changeset: the migration from Google Custom Search to Tavily, and the refactoring of import paths from relative to absolute throughout the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 84.55% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch migrate_to_tavily
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can scan for known vulnerabilities in your dependencies using OSV Scanner.

OSV Scanner will automatically detect and report security vulnerabilities in your project's dependencies. No additional configuration is required.

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (4)
src/utils/preflight.py (1)

28-37: Prefer immutable credential definitions at class scope.

REQUIRED_CREDENTIALS and OPTIONAL_CREDENTIALS are constants semantically; using tuples avoids accidental runtime mutation.

Suggested refactor
-    REQUIRED_CREDENTIALS = [
+    REQUIRED_CREDENTIALS = (
         ("github_token", "GitHub Token", "Required for LLM and embeddings"),
         ("surrealdb_url", "SurrealDB URL", "Database connection"),
         ("surrealdb_user", "SurrealDB Username", "Database authentication"),
         ("surrealdb_pass", "SurrealDB Password", "Database authentication"),
-    ]
+    )

-    OPTIONAL_CREDENTIALS = [
+    OPTIONAL_CREDENTIALS = (
         ("tavily_api_key", "Tavily API Key", "Required for web search"),
-    ]
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/preflight.py` around lines 28 - 37, REQUIRED_CREDENTIALS and
OPTIONAL_CREDENTIALS are defined as mutable lists; change them to immutable
tuples and move them to the appropriate class as class-level constants (e.g.,
Preflight.REQUIRED_CREDENTIALS and Preflight.OPTIONAL_CREDENTIALS) so they
cannot be mutated at runtime; keep the same tuple entries and names to preserve
usage.
src/vectorstore/surrealdb_store.py (1)

23-26: Return type mismatch with interface declaration.

The interface declares embedding_function returns EmbeddingInterface, but this implementation returns GithubEmbeddings. This works at runtime (assuming GithubEmbeddings is a subclass), but the type annotation should match the abstract declaration for consistency.

♻️ Suggested fix
+from src.core.interfaces import EmbeddingInterface
+
     `@property`
-    def embedding_function(self) -> GithubEmbeddings:
+    def embedding_function(self) -> EmbeddingInterface:
         """Get the embedding function used by this vector store"""
         return self._embedding_function
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/vectorstore/surrealdb_store.py` around lines 23 - 26, The
embedding_function property currently types its return as GithubEmbeddings but
the interface declares EmbeddingInterface; update the annotation on the
embedding_function property to return EmbeddingInterface (and ensure the
_embedding_function field is typed accordingly) so the signature matches the
abstract/interface declaration (replace GithubEmbeddings -> EmbeddingInterface
in the property return type and related attribute type hints).
tests/test_parallelism.py (2)

58-64: _has_gather_in_function doesn't verify the call is asyncio.gather.

Similar to the sleep check, this only verifies .gather attribute access. A false positive would occur if any other module's .gather() method exists in the function.

♻️ Suggested improvement
 def _has_gather_in_function(func_node: ast.AST) -> bool:
     """Return True if *func_node* contains an asyncio.gather call."""
     for node in ast.walk(func_node):
         if isinstance(node, ast.Call):
-            if isinstance(node.func, ast.Attribute) and node.func.attr == "gather":
-                return True
+            if isinstance(node.func, ast.Attribute) and node.func.attr == "gather":
+                # Verify it's asyncio.gather
+                if isinstance(node.func.value, ast.Name) and node.func.value.id == "asyncio":
+                    return True
     return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_parallelism.py` around lines 58 - 64, The function
_has_gather_in_function currently only checks for an Attribute named "gather"
and can false-positive on other modules; update it to ensure the call target is
specifically asyncio.gather by checking that node is an ast.Call, node.func is
an ast.Attribute with attr == "gather", and node.func.value is an ast.Name with
id == "asyncio" (or handle a dotted attribute chain where the base Name is
"asyncio"), returning True only in that case; use the existing symbols ast.Call,
node.func, ast.Attribute, and _has_gather_in_function to locate and change the
logic.

42-55: _collect_sleep_calls may produce false positives for non-asyncio sleep calls.

The helper only checks func.attr == "sleep" without verifying the value is actually asyncio. This could match time.sleep(...), some_lib.sleep(...), or any other .sleep() attribute call.

Consider tightening the check:

♻️ Suggested improvement
 def _collect_sleep_calls(tree: ast.AST) -> List[ast.Await]:
     """Return all AST Await nodes that call asyncio.sleep(...)."""
     sleeps = []
     for node in ast.walk(tree):
         if not isinstance(node, ast.Await):
             continue
         call = node.value
         if not isinstance(call, ast.Call):
             continue
         func = call.func
         # Accept both `asyncio.sleep(...)` (Attribute) forms
-        if isinstance(func, ast.Attribute) and func.attr == "sleep":
-            sleeps.append(node)
+        if isinstance(func, ast.Attribute) and func.attr == "sleep":
+            # Verify it's asyncio.sleep specifically
+            if isinstance(func.value, ast.Name) and func.value.id == "asyncio":
+                sleeps.append(node)
     return sleeps
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_parallelism.py` around lines 42 - 55, The helper
_collect_sleep_calls is currently matching any attribute named "sleep" (e.g.,
time.sleep) and should be tightened to only match asyncio.sleep; update the
conditional inside _collect_sleep_calls so it requires isinstance(func,
ast.Attribute) and func.attr == "sleep" and that func.value is an ast.Name with
id == "asyncio" (i.e., check isinstance(func.value, ast.Name) and func.value.id
== "asyncio") so only asyncio.sleep(Attribute) calls are collected; if you need
to also support "from asyncio import sleep" cases, handle ast.Name(func)
separately but only after resolving imports/aliases to confirm it comes from
asyncio.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@check_tavily.py`:
- Around line 50-56: The check_settings() block currently accepts any non-empty
settings.tavily_api_key (causing false positives for placeholder values like
"your_tavily_api_key_here"); update check_settings() to trim whitespace and
reject known placeholder patterns (e.g., exact matches like
"your_tavily_api_key_here" or values that start with "your_" or contain
"example" or "placeholder") before returning True; when rejecting, print the
existing error message (or a slightly more specific one) and return False so
only real API keys pass.

In `@src/config/settings.py`:
- Around line 83-84: The qa_cache_similarity_threshold Field currently accepts
any float and should be constrained to [0.0, 1.0]; update the Field definition
for qa_cache_similarity_threshold in settings.py (the
qa_cache_similarity_threshold symbol) to enforce bounds (e.g., use
Field(default=0.8, ge=0.0, le=1.0) or an equivalent pydantic constrained type)
so values outside 0–1 are rejected/validated.

In `@src/llm/github_llm.py`:
- Around line 130-131: The current synchronous client.complete calls block the
event loop; switch to the async client by importing
azure.ai.inference.aio.ChatCompletionsClient, instantiate it in the class
__init__ (assign to self.client), and replace synchronous calls to
self.client.complete(...) inside async methods with awaited calls (await
self.client.complete(...)); ensure the client is closed/async-disposed where
appropriate and update any initialization/cleanup logic used by the class that
creates the client.
- Around line 83-89: The method _get_max_tokens_param currently drops token
limits when self._use_completion_tokens is True; instead, return the Azure
pass-through form so limits are honored: when self._use_completion_tokens is
True return a dict that supplies model_extras with the key
"max_completion_tokens" set to max_tokens_value (e.g. {"model_extras":
{"max_completion_tokens": max_tokens_value}}), and update the inline comment to
remove the incorrect claim; keep the existing {"max_tokens": max_tokens_value}
return for the non-completion branch and ensure any callers that set headers
include the required "extra-parameters: pass-through" behavior elsewhere in the
codebase.

In `@src/rag/reflexion_engine.py`:
- Around line 595-626: The QA cache storage path can call store_qa_cache with a
None embedding if embed_text fails twice; update the block guarded by
settings.qa_cache_enabled / final_answer to ensure a valid question_embedding
before calling self.vector_store.store_qa_cache (e.g., attempt embedding via
self.vector_store.embedding_function.embed_text and if it returns None or
raises, skip storing and log a warning), or wrap the entire store_qa_cache call
in a try/except that validates question_embedding is not None before invoking
store_qa_cache (refer to symbols question_embedding,
self.vector_store.embedding_function.embed_text,
self.vector_store.store_qa_cache, reflexion_memory, and logger) and include the
actual exception details in logger.error when storage or embedding fails.

In `@src/utils/input_handler.py`:
- Around line 112-115: The multiline cancel path in _get_multiline_input
currently re-enters get_input(prompt) recursively, which can cause unbounded
recursion; change the cancel behavior so _get_multiline_input returns a sentinel
(e.g., None or empty string) instead of calling self.get_input, and update
get_input to handle that sentinel by iterating/re-prompting in a loop (or
repeating the input flow) rather than calling itself recursively. Specifically
modify _get_multiline_input (the branch that checks for "\\e") to return a value
indicating "cancelled" and adjust get_input to detect that return, continue its
prompt loop, and present the single-line prompt again without recursion.

In `@src/utils/preflight.py`:
- Around line 47-63: The presence check treats whitespace-only credential
strings as valid; update the logic in the preflight check (where value =
getattr(settings, attr_name, None) and the subsequent is_present/source
determination) to trim string values first (e.g., apply .strip() to value when
it's a str) and then perform the github_token prefix checks and the "your_"
placeholder check against the trimmed value so that strings composed only of
whitespace are considered missing.

In `@src/vectorstore/surrealdb_store.py`:
- Around line 478-487: The SQL built into the query variable currently
interpolates Python values directly (query_embedding and threshold) which risks
injection; change this to a parameterized SurrealDB query using placeholders
(e.g. $query_embedding and $threshold) instead of f-string interpolation and
pass a validated params dict when executing, or—if SurrealDB parameterization
isn't available—perform strict defensive validation/conversion: ensure
query_embedding is explicitly converted to a List[float] and threshold to a
float before embedding, and only then format a safe, canonical representation;
update the call that executes this query to use the new params and reference the
qa_history table and vector::similarity::cosine usage accordingly.

In `@src/websearch/tavily_search.py`:
- Around line 95-101: The current logic can mark empty or None full_content as
SUCCESS; update the branching around word_count/full_content so empty content is
explicitly handled: if full_content is falsy or word_count == 0 set status to
WebSearchStatus.NO_CONTENT (or WebSearchStatus.LOW_QUALITY if NO_CONTENT doesn't
exist), else compute min_length = settings.web_search_min_content_length and if
len(full_content) < min_length set status = WebSearchStatus.LOW_QUALITY,
otherwise set status = WebSearchStatus.SUCCESS; adjust the block around
variables full_content, word_count, min_length, and the WebSearchStatus
assignments accordingly.

---

Nitpick comments:
In `@src/utils/preflight.py`:
- Around line 28-37: REQUIRED_CREDENTIALS and OPTIONAL_CREDENTIALS are defined
as mutable lists; change them to immutable tuples and move them to the
appropriate class as class-level constants (e.g., Preflight.REQUIRED_CREDENTIALS
and Preflight.OPTIONAL_CREDENTIALS) so they cannot be mutated at runtime; keep
the same tuple entries and names to preserve usage.

In `@src/vectorstore/surrealdb_store.py`:
- Around line 23-26: The embedding_function property currently types its return
as GithubEmbeddings but the interface declares EmbeddingInterface; update the
annotation on the embedding_function property to return EmbeddingInterface (and
ensure the _embedding_function field is typed accordingly) so the signature
matches the abstract/interface declaration (replace GithubEmbeddings ->
EmbeddingInterface in the property return type and related attribute type
hints).

In `@tests/test_parallelism.py`:
- Around line 58-64: The function _has_gather_in_function currently only checks
for an Attribute named "gather" and can false-positive on other modules; update
it to ensure the call target is specifically asyncio.gather by checking that
node is an ast.Call, node.func is an ast.Attribute with attr == "gather", and
node.func.value is an ast.Name with id == "asyncio" (or handle a dotted
attribute chain where the base Name is "asyncio"), returning True only in that
case; use the existing symbols ast.Call, node.func, ast.Attribute, and
_has_gather_in_function to locate and change the logic.
- Around line 42-55: The helper _collect_sleep_calls is currently matching any
attribute named "sleep" (e.g., time.sleep) and should be tightened to only match
asyncio.sleep; update the conditional inside _collect_sleep_calls so it requires
isinstance(func, ast.Attribute) and func.attr == "sleep" and that func.value is
an ast.Name with id == "asyncio" (i.e., check isinstance(func.value, ast.Name)
and func.value.id == "asyncio") so only asyncio.sleep(Attribute) calls are
collected; if you need to also support "from asyncio import sleep" cases, handle
ast.Name(func) separately but only after resolving imports/aliases to confirm it
comes from asyncio.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 880b9a68-fb05-40ea-84f0-4439d5e95e69

📥 Commits

Reviewing files that changed from the base of the PR and between d59ff5e and 81937cb.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • .env.example
  • check_tavily.py
  • pyproject.toml
  • rag.py
  • schema/qa_cache_lookup.surql
  • src/config/settings.py
  • src/core/interfaces.py
  • src/embeddings/github_embeddings.py
  • src/llm/github_llm.py
  • src/rag/engine.py
  • src/rag/reflexion_engine.py
  • src/utils/input_handler.py
  • src/utils/preflight.py
  • src/vectorstore/surrealdb_store.py
  • src/websearch/tavily_search.py
  • tests/test_parallelism.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • schema/qa_cache_lookup.surql

Comment thread check_tavily.py
Comment on lines +50 to +56
key = settings.tavily_api_key
if key:
print(" OK : settings.tavily_api_key loaded")
return True
else:
print(" ERROR: settings.tavily_api_key is empty.")
return False

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Apply placeholder filtering in settings check as well.

On Line 51, check_settings() treats any non-empty value as valid, including your_tavily_api_key_here, so this step can report a false positive.

Suggested fix
-        if key:
+        if key and key != "your_tavily_api_key_here":
             print("  OK   : settings.tavily_api_key loaded")
             return True
         else:
             print("  ERROR: settings.tavily_api_key is empty.")
             return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@check_tavily.py` around lines 50 - 56, The check_settings() block currently
accepts any non-empty settings.tavily_api_key (causing false positives for
placeholder values like "your_tavily_api_key_here"); update check_settings() to
trim whitespace and reject known placeholder patterns (e.g., exact matches like
"your_tavily_api_key_here" or values that start with "your_" or contain
"example" or "placeholder") before returning True; when rejecting, print the
existing error message (or a slightly more specific one) and return False so
only real API keys pass.

Comment thread src/config/settings.py
Comment on lines +83 to +84
qa_cache_enabled: bool = Field(default=True)
qa_cache_similarity_threshold: float = Field(default=0.8)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Constrain QA cache similarity threshold range.

On Line 84, qa_cache_similarity_threshold accepts any float. Values outside [0.0, 1.0] can silently break cache hit behavior.

Suggested fix
-    qa_cache_similarity_threshold: float = Field(default=0.8)
+    qa_cache_similarity_threshold: float = Field(default=0.8, ge=0.0, le=1.0)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
qa_cache_enabled: bool = Field(default=True)
qa_cache_similarity_threshold: float = Field(default=0.8)
qa_cache_enabled: bool = Field(default=True)
qa_cache_similarity_threshold: float = Field(default=0.8, ge=0.0, le=1.0)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/settings.py` around lines 83 - 84, The
qa_cache_similarity_threshold Field currently accepts any float and should be
constrained to [0.0, 1.0]; update the Field definition for
qa_cache_similarity_threshold in settings.py (the qa_cache_similarity_threshold
symbol) to enforce bounds (e.g., use Field(default=0.8, ge=0.0, le=1.0) or an
equivalent pydantic constrained type) so values outside 0–1 are
rejected/validated.

Comment thread src/llm/github_llm.py
Comment on lines +83 to +89
def _get_max_tokens_param(self, max_tokens_value: int) -> dict:
"""Get the appropriate max tokens parameter for the model"""
if self._use_completion_tokens:
# Azure AI Inference SDK doesn't support max_completion_tokens
# For newer models, we omit the parameter and let the model use its default
return {}
return {"max_tokens": max_tokens_value}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For the azure-ai-inference Python SDK version used here, what is the supported way to cap output tokens for ChatCompletionsClient when a model requires completion-token style limits (e.g., max_completion_tokens)?

💡 Result:

In the azure-ai-inference Python SDK (e.g., ChatCompletionsClient in 1.0.0b9), the supported way is to pass model-specific fields via model_extras, and instruct the service to forward unknown fields by setting the extra-parameters: pass-through header. The SDK documents that model_extras is injected “as-is” at the root of the JSON body and that handling depends on the extra-parameters request header. (azuresdkdocs.z19.web.core.windows.net) The REST API docs describe extra-parameters: pass-through for forwarding such parameters. (learn.microsoft.com)

resp = client.complete(
    messages=[...],
    # don't use max_tokens for these models
    model_extras={"max_completion_tokens": 512},
    headers={"extra-parameters": "pass-through"},
)

Citations:


🏁 Script executed:

# Find the SDK version being used
find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "poetry.lock" | head -20

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 85


🏁 Script executed:

# Search for azure-ai-inference version specification
rg "azure-ai-inference" --type py --type txt

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 51


🏁 Script executed:

# Find all usages of _get_max_tokens_param
rg "_get_max_tokens_param" -B 2 -A 5

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 1132


🏁 Script executed:

cat pyproject.toml

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 778


🏁 Script executed:

# Get broader context around the params.update call
rg "params.update.*_get_max_tokens_param" -B 20 -A 10 src/llm/github_llm.py

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 1197


🏁 Script executed:

# Find where params are used (likely in a complete() or similar call)
rg "def.*complete|def.*chat" -A 30 src/llm/github_llm.py | head -80

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 1319


🏁 Script executed:

# Check if complete() call shows any header parameter usage in the file
rg "complete\(" -A 5 src/llm/github_llm.py | head -30

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 563


🏁 Script executed:

# Search for any header-related code in the file
rg "header|Header" src/llm/github_llm.py

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 51


Update the method to support max_completion_tokens for models that require it, or document why limits are being dropped.

The azure-ai-inference SDK (1.0.0b9+) does support max_completion_tokens—contrary to the inline comment—by passing model_extras={"max_completion_tokens": value} with the extra-parameters: pass-through header. Silently returning {} on line 88 discards the caller's token limit entirely, risking unexpectedly long and costly completions. Either implement the proper limit using the supported mechanism, or explicitly document and justify why newer models use unbounded output.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/github_llm.py` around lines 83 - 89, The method _get_max_tokens_param
currently drops token limits when self._use_completion_tokens is True; instead,
return the Azure pass-through form so limits are honored: when
self._use_completion_tokens is True return a dict that supplies model_extras
with the key "max_completion_tokens" set to max_tokens_value (e.g.
{"model_extras": {"max_completion_tokens": max_tokens_value}}), and update the
inline comment to remove the incorrect claim; keep the existing {"max_tokens":
max_tokens_value} return for the non-completion branch and ensure any callers
that set headers include the required "extra-parameters: pass-through" behavior
elsewhere in the codebase.

Comment thread src/llm/github_llm.py
Comment on lines +130 to +131
response = self.client.complete(**params) # type: ignore[attr-defined]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
import inspect
try:
    from azure.ai.inference import ChatCompletionsClient
    print("sync complete signature:", inspect.signature(ChatCompletionsClient.complete))
except Exception as e:
    print("sync client import error:", e)

try:
    from azure.ai.inference.aio import ChatCompletionsClient as AioChatCompletionsClient
    print("aio client available: yes")
    print("aio complete signature:", inspect.signature(AioChatCompletionsClient.complete))
    print("aio complete is coroutine:", inspect.iscoroutinefunction(AioChatCompletionsClient.complete))
except Exception as e:
    print("aio client available: no")
    print("aio import error:", e)
PY

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 185


🏁 Script executed:

# Check dependencies
find . -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" | head -5

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 85


🏁 Script executed:

# Examine the file in question
cat -n src/llm/github_llm.py | sed -n '120,210p'

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 4551


🏁 Script executed:

# Check for any async client usage patterns elsewhere
rg "aio|async.*client|ChatCompletionsClient" --type py

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 214


🏁 Script executed:

cat pyproject.toml | grep -A 20 -B 5 "azure"

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 640


🏁 Script executed:

# Also check if there's any async usage pattern elsewhere in the codebase
rg "async def|await" src/llm/ --type py -B 2 -A 2

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 633


🌐 Web query:

azure-ai-inference 1.0.0 async ChatCompletionsClient aio documentation

💡 Result:

For async chat completions in azure-ai-inference, use the azure.ai.inference.aio.ChatCompletionsClient (the async client lives under the aio namespace). The SDK docs are versioned; the current published docs I can find are for azure-ai-inference 1.0.0b9 (beta), not a GA 1.0.0. [1]

Async client API docs (aio):

  • azure.ai.inference.aio.ChatCompletionsClient (constructor, async complete(...), async close(), etc.) [2]

PyPI “async” notes (install + import path):

  • Install aiohttp, then import from azure.ai.inference.aio instead of azure.ai.inference. [1]

Minimal async usage (from the documented pattern):

import asyncio
from azure.ai.inference.aio import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

async def main():
    client = ChatCompletionsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
    try:
        resp = await client.complete(messages=[
            SystemMessage("You are a helpful assistant."),
            UserMessage("How many feet are in a mile?")
        ])
        print(resp.choices[0].message.content)
    finally:
        await client.close()

asyncio.run(main())

[1][2]

Sources

  1. azure-ai-inference PyPI README (async client + aiohttp + azure.ai.inference.aio import). [1]
  2. Azure SDK for Python API reference: azure.ai.inference.aio.ChatCompletionsClient (async complete, etc.). [2]

🏁 Script executed:

# Check if there are any other LLM clients in the codebase that use async patterns correctly
rg "async for|await.*complete|aio" --type py src/

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 1221


Replace sync client with async client to avoid blocking the event loop.

Lines 130 and 202 execute synchronous client.complete() calls inside async def methods. The azure-ai-inference SDK provides an async variant at azure.ai.inference.aio.ChatCompletionsClient with an async complete() method that must be awaited. The current sync call will block the event loop under concurrent load, degrading performance when multiple streams are active (confirmed usage pattern in src/reflexion/evaluator.py and src/rag/reflexion_engine.py).

Replace with:

from azure.ai.inference.aio import ChatCompletionsClient
# In __init__: self.client = ChatCompletionsClient(...)
# At lines 130, 202: response = await self.client.complete(**params)

Also applies to: 202-203

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/github_llm.py` around lines 130 - 131, The current synchronous
client.complete calls block the event loop; switch to the async client by
importing azure.ai.inference.aio.ChatCompletionsClient, instantiate it in the
class __init__ (assign to self.client), and replace synchronous calls to
self.client.complete(...) inside async methods with awaited calls (await
self.client.complete(...)); ensure the client is closed/async-disposed where
appropriate and update any initialization/cleanup logic used by the class that
creates the client.

Comment on lines +595 to +626
# Store in QA semantic cache for future similar questions
if settings.qa_cache_enabled and final_answer:
try:
# Reuse the embedding computed before the reflexion loop
if question_embedding is None:
question_embedding = (
await self.vector_store.embedding_function.embed_text(
question
)
)

await self.vector_store.store_qa_cache(
question=question,
question_embedding=question_embedding,
answer=final_answer,
metadata={
"total_cycles": len(reflexion_memory.cycles),
"total_processing_time": reflexion_memory.total_processing_time,
"final_confidence": reflexion_memory.cycles[
-1
].evaluation.confidence_score
if reflexion_memory.cycles
else 0.0,
"web_search_used": any(
cycle.web_search_enabled
for cycle in reflexion_memory.cycles
),
},
)
logger.info("Stored QA cache entry for future similar questions")
except Exception as e:
logger.error("Error storing QA cache", error=str(e))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential issue: question_embedding could be None when storing QA cache.

If settings.qa_cache_enabled is True but the initial embedding fails (line 152 sets question_embedding = None), the code at line 599 regenerates the embedding. However, if this second embedding call also fails, store_qa_cache would be called with a potentially invalid embedding.

Consider wrapping the entire QA cache storage in a try-except or checking the embedding result:

🛡️ Suggested improvement
             if settings.qa_cache_enabled and final_answer:
                 try:
                     # Reuse the embedding computed before the reflexion loop
                     if question_embedding is None:
                         question_embedding = (
                             await self.vector_store.embedding_function.embed_text(
                                 question
                             )
                         )
+                    
+                    if not question_embedding:
+                        logger.warning("Failed to generate question embedding for QA cache")
+                        raise ValueError("Empty embedding")

                     await self.vector_store.store_qa_cache(
🧰 Tools
🪛 Ruff (0.15.5)

[warning] 625-625: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rag/reflexion_engine.py` around lines 595 - 626, The QA cache storage
path can call store_qa_cache with a None embedding if embed_text fails twice;
update the block guarded by settings.qa_cache_enabled / final_answer to ensure a
valid question_embedding before calling self.vector_store.store_qa_cache (e.g.,
attempt embedding via self.vector_store.embedding_function.embed_text and if it
returns None or raises, skip storing and log a warning), or wrap the entire
store_qa_cache call in a try/except that validates question_embedding is not
None before invoking store_qa_cache (refer to symbols question_embedding,
self.vector_store.embedding_function.embed_text,
self.vector_store.store_qa_cache, reflexion_memory, and logger) and include the
actual exception details in logger.error when storage or embedding fails.

Comment on lines +112 to +115
if line.strip().lower() == "\\e":
console.print("[dim]Multi-line input cancelled.[/dim]")
# Fall back to normal single-line input
return self.get_input(prompt)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid recursive re-entry from multiline cancel paths.

On Line 115 and Line 128, calling self.get_input(prompt) from inside _get_multiline_input() can build unbounded recursion during repeated cancel cycles.

Suggested fix
-                if raw_input.strip().lower() == "\\m":
-                    # Enter multiline mode
-                    return self._get_multiline_input(prompt)
+                if raw_input.strip().lower() == "\\m":
+                    # Enter multiline mode
+                    multiline_result = self._get_multiline_input(prompt)
+                    if multiline_result is None:
+                        continue
+                    return multiline_result
-                        # Fall back to normal single-line input
-                        return self.get_input(prompt)
+                        # Cancel multiline and return control to caller loop
+                        return None
...
-                    return self.get_input(prompt)
+                    return None

Also applies to: 126-128

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/input_handler.py` around lines 112 - 115, The multiline cancel path
in _get_multiline_input currently re-enters get_input(prompt) recursively, which
can cause unbounded recursion; change the cancel behavior so
_get_multiline_input returns a sentinel (e.g., None or empty string) instead of
calling self.get_input, and update get_input to handle that sentinel by
iterating/re-prompting in a loop (or repeating the input flow) rather than
calling itself recursively. Specifically modify _get_multiline_input (the branch
that checks for "\\e") to return a value indicating "cancelled" and adjust
get_input to detect that return, continue its prompt loop, and present the
single-line prompt again without recursion.

Comment thread src/utils/preflight.py
Comment on lines +47 to +63
value = getattr(settings, attr_name, None)

# Check if value is present and not a placeholder
is_present = False
source = "missing"

if value:
# For GitHub token, ghp_ prefix is valid
if attr_name == "github_token" and value.startswith(
("ghp_", "github_pat_")
):
is_present = True
source = "env"
elif not value.lower().startswith("your_"):
is_present = True
source = "env"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Trim credential values before presence checks.

On Line 53, whitespace-only strings (e.g. " ") are treated as present, which can produce false “ready” status for required credentials.

Suggested fix
-        value = getattr(settings, attr_name, None)
+        raw_value = getattr(settings, attr_name, None)
+        value = raw_value.strip() if isinstance(raw_value, str) else raw_value
...
-        if value:
+        if isinstance(value, str) and value:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/preflight.py` around lines 47 - 63, The presence check treats
whitespace-only credential strings as valid; update the logic in the preflight
check (where value = getattr(settings, attr_name, None) and the subsequent
is_present/source determination) to trim string values first (e.g., apply
.strip() to value when it's a str) and then perform the github_token prefix
checks and the "your_" placeholder check against the trimmed value so that
strings composed only of whitespace are considered missing.

Comment on lines +478 to +487
try:
# Search for similar questions using cosine similarity
query = f"""
SELECT id, question, answer, metadata,
vector::similarity::cosine(question_embedding, {query_embedding}) AS score
FROM qa_history
WHERE vector::similarity::cosine(question_embedding, {query_embedding}) >= {threshold}
ORDER BY score DESC
LIMIT 1;
"""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

SQL injection vector through string interpolation.

The static analysis flagged S608 here. While query_embedding is a List[float] and threshold is a float (mitigating practical injection risk), directly interpolating Python values into query strings is a code smell and could become problematic if types aren't validated upstream.

Consider using SurrealDB's parameterized query syntax if supported, or add explicit type validation:

🛡️ Defensive validation approach
     async def lookup_qa_cache(
         self,
         query_embedding: List[float],
         threshold: float = 0.85,
     ) -> Dict[str, Any] | None:
         """Lookup similar question in cache by embedding similarity"""
         await self._ensure_connection()

         if not self.client:
             raise VectorStoreException("Client not connected")

+        # Validate types to prevent injection
+        if not isinstance(threshold, (int, float)) or not (0.0 <= threshold <= 1.0):
+            raise ValueError("threshold must be a float between 0.0 and 1.0")
+        if not isinstance(query_embedding, list) or not all(isinstance(x, (int, float)) for x in query_embedding):
+            raise ValueError("query_embedding must be a list of floats")
+
         try:
             # Search for similar questions using cosine similarity
             query = f"""
🧰 Tools
🪛 Ruff (0.15.5)

[error] 480-487: Possible SQL injection vector through string-based query construction

(S608)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/vectorstore/surrealdb_store.py` around lines 478 - 487, The SQL built
into the query variable currently interpolates Python values directly
(query_embedding and threshold) which risks injection; change this to a
parameterized SurrealDB query using placeholders (e.g. $query_embedding and
$threshold) instead of f-string interpolation and pass a validated params dict
when executing, or—if SurrealDB parameterization isn't available—perform strict
defensive validation/conversion: ensure query_embedding is explicitly converted
to a List[float] and threshold to a float before embedding, and only then format
a safe, canonical representation; update the call that executes this query to
use the new params and reference the qa_history table and
vector::similarity::cosine usage accordingly.

Comment on lines +95 to +101
word_count = len(full_content.split()) if full_content else 0

min_length = settings.web_search_min_content_length
if word_count > 0 and len(full_content) < min_length:
status = WebSearchStatus.LOW_QUALITY
else:
status = WebSearchStatus.SUCCESS

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Edge case: word_count is zero when full_content is falsy, but status defaults to SUCCESS.

When full_content is empty/None, word_count will be 0, but the condition word_count > 0 and len(full_content) < min_length is False, so status defaults to SUCCESS. An empty result marked as SUCCESS may be misleading.

♻️ Suggested fix
             word_count = len(full_content.split()) if full_content else 0

             min_length = settings.web_search_min_content_length
-            if word_count > 0 and len(full_content) < min_length:
+            if not full_content or word_count == 0:
+                status = WebSearchStatus.LOW_QUALITY
+            elif len(full_content) < min_length:
                 status = WebSearchStatus.LOW_QUALITY
             else:
                 status = WebSearchStatus.SUCCESS
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/websearch/tavily_search.py` around lines 95 - 101, The current logic can
mark empty or None full_content as SUCCESS; update the branching around
word_count/full_content so empty content is explicitly handled: if full_content
is falsy or word_count == 0 set status to WebSearchStatus.NO_CONTENT (or
WebSearchStatus.LOW_QUALITY if NO_CONTENT doesn't exist), else compute
min_length = settings.web_search_min_content_length and if len(full_content) <
min_length set status = WebSearchStatus.LOW_QUALITY, otherwise set status =
WebSearchStatus.SUCCESS; adjust the block around variables full_content,
word_count, min_length, and the WebSearchStatus assignments accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants