Migrate web search to Tavily and refactor imports by CLoaKY233 · Pull Request #5 · CLoaKY233/Multi-cycle-RAG

CLoaKY233 · 2026-03-15T10:01:25Z

This pull request introduces several improvements and new features across configuration, database schema, developer tooling, and prompt templates. The key updates include switching web search integration from Google to Tavily, adding semantic QA cache support, enhancing developer and validation workflows, and refining prompt generation guidelines.

Web Search Integration and Configuration:

Switched web search provider from Google Custom Search to Tavily, updating .env.example and src/config/settings.py to use TAVILY_API_KEY and related Tavily-specific settings. Added a diagnostic script check_tavily.py to verify Tavily integration. [1] [2] [3]

Semantic QA Cache Support:

Added schema and SurrealDB function for a semantic QA cache: new qa_history table, a similarity search function fn::qa_cache_lookup, and related config options to enable and set threshold for cache lookups. [1] [2] [3]

Database Schema Improvements:

Made table creation idempotent by adding IF NOT EXISTS to documents and web_search tables, and removed unused HNSW index definitions for cleaner SurrealDB schema migrations. [1] [2]

Developer Tooling and Validation:

Added developer and validation infrastructure: .factory/services.yaml with install/test/lint commands, .factory/init.sh for environment sync, user testing documentation, and performance/parallelism validation results and review. [1] [2] [3] [4] [5]
Added a detailed skill description for the backend worker, outlining procedures for database, Python, and CLI enhancements.

Prompt Template and Dependency Updates:

Refined prompt templates to enforce complete, self-contained responses, and clarified formatting for the "Sources" section. [1] [2] [3] [4] [5]
Updated dependencies to include tavily-python and optional developer dependencies (pytest, pytest-asyncio), and improved Pyright and pytest configuration.

Other Minor Improvements:

Fixed import paths in src/__init__.py to use absolute imports.
Added an abstract property for embedding_function in the vector store interface for better extensibility.

Let me know if you want a deeper dive into any of these areas!

Summary by CodeRabbit

New Features
- Web search now uses Tavily API for improved search results
- Added QA semantic caching to reuse similar answers and reduce processing time
- Enhanced CLI with interactive runtime configuration menu for web search, reflexion cycles, confidence thresholds, and cache controls
- Added pre-flight credential validation checks for required services
Chores
- Updated dependencies to support Tavily integration and testing framework

coderabbitai · 2026-03-15T10:01:40Z

📝 Walkthrough

Walkthrough

This PR introduces Tavily-based web search replacing Google Custom Search, implements QA semantic caching, adds runtime configuration support, refactors imports to absolute paths, creates a preflight credential validation system, and enhances the interactive CLI with dynamic runtime adjustments and status reporting.

Changes

Cohort / File(s)	Summary
Web Search Migration `.env.example`, `src/websearch/__init__.py`, `src/websearch/google_search.py`, `src/websearch/tavily_search.py`, `check_tavily.py`	Replaces Google Custom Search API with Tavily API integration. Removes google_api_key/google_cse_id, adds TAVILY_API_KEY with new web search controls (content extraction, title length, content length, timeout). Adds diagnostic script for Tavily configuration validation. GoogleWebSearch class deleted; TavilyWebSearch added with native API integration and lazy client initialization.
Configuration & Settings `src/config/settings.py`, `pyproject.toml`	Replaces Google credentials (google_api_key, google_cse_id) with tavily_api_key. Adds QA cache settings (qa_cache_enabled, qa_cache_similarity_threshold). Adds tavily-python>=0.7.23 dependency and pytest/pytest-asyncio dev dependencies. Adds pytest configuration with asyncio_mode and testpaths.
Database Schema `schema/documents.surql`, `schema/web_search.surql`, `schema/qa_history.surql`, `schema/qa_cache_lookup.surql`	Converts table definitions to IF NOT EXISTS form. Removes HNSW vector indexes from documents and web_search tables. Adds new qa_history table with question embeddings, answers, and metadata for semantic caching. Adds qa_cache_lookup SurrealQL function for similarity-based lookup.
Core Engine & RAG `src/rag/engine.py`, `src/rag/reflexion_engine.py`, `src/core/interfaces.py`	Adds public wrapper methods to RAGEngine for runtime configuration (web_search_mode, max_cycles, confidence_threshold, qa_cache control). Adds QA cache and web search deletion methods. ReflexionRAGEngine replaces GoogleWebSearch with TavilyWebSearch, implements QA cache lookup during query processing, parallelizes DB/web retrieval via asyncio.gather, stores QA pairs with embeddings and metadata. Enhances VectorStoreInterface with embedding_function property, store_qa_cache, lookup_qa_cache, delete_all_web_searches, clear_qa_cache methods.
Vector Store Implementation `src/vectorstore/surrealdb_store.py`	Implements QA cache methods (store_qa_cache, lookup_qa_cache, clear_qa_cache). Exposes embedding_function property. Removes HNSW index initialization from schema setup. Parallelizes similarity_search_combined using asyncio.gather. Adds QA history table initialization with question embeddings and metadata fields.
Import Path Refactoring `src/__init__.py`, `src/data/loader.py`, `src/data/processor.py`, `src/embeddings/github_embeddings.py`, `src/llm/github_llm.py`, `src/memory/cache.py`, `src/reflexion/evaluator.py`	Converts relative imports to absolute src.* imports across data, embeddings, LLM, memory, and reflexion modules. Updates public API in src/init.py to export Document and settings alongside RAGEngine.
LLM Model Capabilities `src/llm/github_llm.py`	Adds model capability detection methods (_requires_completion_tokens, _model_supports_temperature, _is_temperature_error, _get_max_tokens_param, _build_params). Implements automatic retry logic in generate_stream and chat_stream to handle models with unsupported temperature parameter. Maintains streaming behavior with usage tracking.
Embedding Async Support `src/embeddings/github_embeddings.py`	Wraps synchronous embedding client calls with asyncio.to_thread for asynchronous execution in embed_text and embed_documents methods.
Interactive CLI Enhancements `rag.py`, `src/utils/input_handler.py`, `src/utils/preflight.py`	Introduces RuntimeConfig class to manage runtime settings. Enhances InteractiveRAGChat with status bar rendering, dynamic runtime configuration menu, and input handler integration. Adds CLI options for docs path, ingestion, web search control, reflexion cycles, threshold, cache management, and model override. Implements preflight credential validation (PreflightChecker) for required (GitHub, SurrealDB) and optional (Tavily) credentials. Adds check CLI command for pre-flight validation. Implements MultiLineInput and EnhancedInputHandler for improved interactive input with history and multiline support.
Prompt Templates `prompts/generation/initial_generation.yaml`, `prompts/generation/reflexion_generation.yaml`, `prompts/generation/simple_generation.yaml`	Adds explicit guidelines requiring complete, self-contained responses that never end mid-sentence or mid-list. Adds guideline that Sources section must be final and each source entry must end with a period. Ensures consistent formatting constraints across all generation prompt variants.
Testing `tests/test_parallelism.py`	Adds comprehensive test suite validating asyncio concurrency patterns, absence of sleep calls in main loops, use of asyncio.gather in parallel operations, and timing comparisons of concurrent vs. sequential execution. Includes AST-based analysis tests to verify parallelization patterns.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A web search hops from Google to Tavily's door,
Caching questions with embeddings galore!
Runtime configs dance in menus bright,
Preflight checks ensure credentials are right.
Asyncio gathers in parallel flight—
bounce bounce ✨ The RAG engine takes flight!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Migrate web search to Tavily and refactor imports' accurately and concisely summarizes the two main components of this changeset: the migration from Google Custom Search to Tavily, and the refactoring of import paths from relative to absolute throughout the codebase.
Docstring Coverage	✅ Passed	Docstring coverage is 84.55% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch migrate_to_tavily

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can scan for known vulnerabilities in your dependencies using OSV Scanner.

OSV Scanner will automatically detect and report security vulnerabilities in your project's dependencies. No additional configuration is required.

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (4)

src/utils/preflight.py (1)

28-37: Prefer immutable credential definitions at class scope.

REQUIRED_CREDENTIALS and OPTIONAL_CREDENTIALS are constants semantically; using tuples avoids accidental runtime mutation.

Suggested refactor

-    REQUIRED_CREDENTIALS = [
+    REQUIRED_CREDENTIALS = (
         ("github_token", "GitHub Token", "Required for LLM and embeddings"),
         ("surrealdb_url", "SurrealDB URL", "Database connection"),
         ("surrealdb_user", "SurrealDB Username", "Database authentication"),
         ("surrealdb_pass", "SurrealDB Password", "Database authentication"),
-    ]
+    )

-    OPTIONAL_CREDENTIALS = [
+    OPTIONAL_CREDENTIALS = (
         ("tavily_api_key", "Tavily API Key", "Required for web search"),
-    ]
+    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/preflight.py` around lines 28 - 37, REQUIRED_CREDENTIALS and
OPTIONAL_CREDENTIALS are defined as mutable lists; change them to immutable
tuples and move them to the appropriate class as class-level constants (e.g.,
Preflight.REQUIRED_CREDENTIALS and Preflight.OPTIONAL_CREDENTIALS) so they
cannot be mutated at runtime; keep the same tuple entries and names to preserve
usage.

src/vectorstore/surrealdb_store.py (1)

23-26: Return type mismatch with interface declaration.

The interface declares embedding_function returns EmbeddingInterface, but this implementation returns GithubEmbeddings. This works at runtime (assuming GithubEmbeddings is a subclass), but the type annotation should match the abstract declaration for consistency.

♻️ Suggested fix

+from src.core.interfaces import EmbeddingInterface
+
     `@property`
-    def embedding_function(self) -> GithubEmbeddings:
+    def embedding_function(self) -> EmbeddingInterface:
         """Get the embedding function used by this vector store"""
         return self._embedding_function

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/vectorstore/surrealdb_store.py` around lines 23 - 26, The
embedding_function property currently types its return as GithubEmbeddings but
the interface declares EmbeddingInterface; update the annotation on the
embedding_function property to return EmbeddingInterface (and ensure the
_embedding_function field is typed accordingly) so the signature matches the
abstract/interface declaration (replace GithubEmbeddings -> EmbeddingInterface
in the property return type and related attribute type hints).

tests/test_parallelism.py (2)

58-64: _has_gather_in_function doesn't verify the call is asyncio.gather.

Similar to the sleep check, this only verifies .gather attribute access. A false positive would occur if any other module's .gather() method exists in the function.

♻️ Suggested improvement

 def _has_gather_in_function(func_node: ast.AST) -> bool:
     """Return True if *func_node* contains an asyncio.gather call."""
     for node in ast.walk(func_node):
         if isinstance(node, ast.Call):
-            if isinstance(node.func, ast.Attribute) and node.func.attr == "gather":
-                return True
+            if isinstance(node.func, ast.Attribute) and node.func.attr == "gather":
+                # Verify it's asyncio.gather
+                if isinstance(node.func.value, ast.Name) and node.func.value.id == "asyncio":
+                    return True
     return False

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_parallelism.py` around lines 58 - 64, The function
_has_gather_in_function currently only checks for an Attribute named "gather"
and can false-positive on other modules; update it to ensure the call target is
specifically asyncio.gather by checking that node is an ast.Call, node.func is
an ast.Attribute with attr == "gather", and node.func.value is an ast.Name with
id == "asyncio" (or handle a dotted attribute chain where the base Name is
"asyncio"), returning True only in that case; use the existing symbols ast.Call,
node.func, ast.Attribute, and _has_gather_in_function to locate and change the
logic.

42-55: _collect_sleep_calls may produce false positives for non-asyncio sleep calls.

The helper only checks func.attr == "sleep" without verifying the value is actually asyncio. This could match time.sleep(...), some_lib.sleep(...), or any other .sleep() attribute call.

Consider tightening the check:

♻️ Suggested improvement

 def _collect_sleep_calls(tree: ast.AST) -> List[ast.Await]:
     """Return all AST Await nodes that call asyncio.sleep(...)."""
     sleeps = []
     for node in ast.walk(tree):
         if not isinstance(node, ast.Await):
             continue
         call = node.value
         if not isinstance(call, ast.Call):
             continue
         func = call.func
         # Accept both `asyncio.sleep(...)` (Attribute) forms
-        if isinstance(func, ast.Attribute) and func.attr == "sleep":
-            sleeps.append(node)
+        if isinstance(func, ast.Attribute) and func.attr == "sleep":
+            # Verify it's asyncio.sleep specifically
+            if isinstance(func.value, ast.Name) and func.value.id == "asyncio":
+                sleeps.append(node)
     return sleeps

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_parallelism.py` around lines 42 - 55, The helper
_collect_sleep_calls is currently matching any attribute named "sleep" (e.g.,
time.sleep) and should be tightened to only match asyncio.sleep; update the
conditional inside _collect_sleep_calls so it requires isinstance(func,
ast.Attribute) and func.attr == "sleep" and that func.value is an ast.Name with
id == "asyncio" (i.e., check isinstance(func.value, ast.Name) and func.value.id
== "asyncio") so only asyncio.sleep(Attribute) calls are collected; if you need
to also support "from asyncio import sleep" cases, handle ast.Name(func)
separately but only after resolving imports/aliases to confirm it comes from
asyncio.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@check_tavily.py`:
- Around line 50-56: The check_settings() block currently accepts any non-empty
settings.tavily_api_key (causing false positives for placeholder values like
"your_tavily_api_key_here"); update check_settings() to trim whitespace and
reject known placeholder patterns (e.g., exact matches like
"your_tavily_api_key_here" or values that start with "your_" or contain
"example" or "placeholder") before returning True; when rejecting, print the
existing error message (or a slightly more specific one) and return False so
only real API keys pass.

In `@src/config/settings.py`:
- Around line 83-84: The qa_cache_similarity_threshold Field currently accepts
any float and should be constrained to [0.0, 1.0]; update the Field definition
for qa_cache_similarity_threshold in settings.py (the
qa_cache_similarity_threshold symbol) to enforce bounds (e.g., use
Field(default=0.8, ge=0.0, le=1.0) or an equivalent pydantic constrained type)
so values outside 0–1 are rejected/validated.

In `@src/llm/github_llm.py`:
- Around line 130-131: The current synchronous client.complete calls block the
event loop; switch to the async client by importing
azure.ai.inference.aio.ChatCompletionsClient, instantiate it in the class
__init__ (assign to self.client), and replace synchronous calls to
self.client.complete(...) inside async methods with awaited calls (await
self.client.complete(...)); ensure the client is closed/async-disposed where
appropriate and update any initialization/cleanup logic used by the class that
creates the client.
- Around line 83-89: The method _get_max_tokens_param currently drops token
limits when self._use_completion_tokens is True; instead, return the Azure
pass-through form so limits are honored: when self._use_completion_tokens is
True return a dict that supplies model_extras with the key
"max_completion_tokens" set to max_tokens_value (e.g. {"model_extras":
{"max_completion_tokens": max_tokens_value}}), and update the inline comment to
remove the incorrect claim; keep the existing {"max_tokens": max_tokens_value}
return for the non-completion branch and ensure any callers that set headers
include the required "extra-parameters: pass-through" behavior elsewhere in the
codebase.

In `@src/rag/reflexion_engine.py`:
- Around line 595-626: The QA cache storage path can call store_qa_cache with a
None embedding if embed_text fails twice; update the block guarded by
settings.qa_cache_enabled / final_answer to ensure a valid question_embedding
before calling self.vector_store.store_qa_cache (e.g., attempt embedding via
self.vector_store.embedding_function.embed_text and if it returns None or
raises, skip storing and log a warning), or wrap the entire store_qa_cache call
in a try/except that validates question_embedding is not None before invoking
store_qa_cache (refer to symbols question_embedding,
self.vector_store.embedding_function.embed_text,
self.vector_store.store_qa_cache, reflexion_memory, and logger) and include the
actual exception details in logger.error when storage or embedding fails.

In `@src/utils/input_handler.py`:
- Around line 112-115: The multiline cancel path in _get_multiline_input
currently re-enters get_input(prompt) recursively, which can cause unbounded
recursion; change the cancel behavior so _get_multiline_input returns a sentinel
(e.g., None or empty string) instead of calling self.get_input, and update
get_input to handle that sentinel by iterating/re-prompting in a loop (or
repeating the input flow) rather than calling itself recursively. Specifically
modify _get_multiline_input (the branch that checks for "\\e") to return a value
indicating "cancelled" and adjust get_input to detect that return, continue its
prompt loop, and present the single-line prompt again without recursion.

In `@src/utils/preflight.py`:
- Around line 47-63: The presence check treats whitespace-only credential
strings as valid; update the logic in the preflight check (where value =
getattr(settings, attr_name, None) and the subsequent is_present/source
determination) to trim string values first (e.g., apply .strip() to value when
it's a str) and then perform the github_token prefix checks and the "your_"
placeholder check against the trimmed value so that strings composed only of
whitespace are considered missing.

In `@src/vectorstore/surrealdb_store.py`:
- Around line 478-487: The SQL built into the query variable currently
interpolates Python values directly (query_embedding and threshold) which risks
injection; change this to a parameterized SurrealDB query using placeholders
(e.g. $query_embedding and $threshold) instead of f-string interpolation and
pass a validated params dict when executing, or—if SurrealDB parameterization
isn't available—perform strict defensive validation/conversion: ensure
query_embedding is explicitly converted to a List[float] and threshold to a
float before embedding, and only then format a safe, canonical representation;
update the call that executes this query to use the new params and reference the
qa_history table and vector::similarity::cosine usage accordingly.

In `@src/websearch/tavily_search.py`:
- Around line 95-101: The current logic can mark empty or None full_content as
SUCCESS; update the branching around word_count/full_content so empty content is
explicitly handled: if full_content is falsy or word_count == 0 set status to
WebSearchStatus.NO_CONTENT (or WebSearchStatus.LOW_QUALITY if NO_CONTENT doesn't
exist), else compute min_length = settings.web_search_min_content_length and if
len(full_content) < min_length set status = WebSearchStatus.LOW_QUALITY,
otherwise set status = WebSearchStatus.SUCCESS; adjust the block around
variables full_content, word_count, min_length, and the WebSearchStatus
assignments accordingly.

---

Nitpick comments:
In `@src/utils/preflight.py`:
- Around line 28-37: REQUIRED_CREDENTIALS and OPTIONAL_CREDENTIALS are defined
as mutable lists; change them to immutable tuples and move them to the
appropriate class as class-level constants (e.g., Preflight.REQUIRED_CREDENTIALS
and Preflight.OPTIONAL_CREDENTIALS) so they cannot be mutated at runtime; keep
the same tuple entries and names to preserve usage.

In `@src/vectorstore/surrealdb_store.py`:
- Around line 23-26: The embedding_function property currently types its return
as GithubEmbeddings but the interface declares EmbeddingInterface; update the
annotation on the embedding_function property to return EmbeddingInterface (and
ensure the _embedding_function field is typed accordingly) so the signature
matches the abstract/interface declaration (replace GithubEmbeddings ->
EmbeddingInterface in the property return type and related attribute type
hints).

In `@tests/test_parallelism.py`:
- Around line 58-64: The function _has_gather_in_function currently only checks
for an Attribute named "gather" and can false-positive on other modules; update
it to ensure the call target is specifically asyncio.gather by checking that
node is an ast.Call, node.func is an ast.Attribute with attr == "gather", and
node.func.value is an ast.Name with id == "asyncio" (or handle a dotted
attribute chain where the base Name is "asyncio"), returning True only in that
case; use the existing symbols ast.Call, node.func, ast.Attribute, and
_has_gather_in_function to locate and change the logic.
- Around line 42-55: The helper _collect_sleep_calls is currently matching any
attribute named "sleep" (e.g., time.sleep) and should be tightened to only match
asyncio.sleep; update the conditional inside _collect_sleep_calls so it requires
isinstance(func, ast.Attribute) and func.attr == "sleep" and that func.value is
an ast.Name with id == "asyncio" (i.e., check isinstance(func.value, ast.Name)
and func.value.id == "asyncio") so only asyncio.sleep(Attribute) calls are
collected; if you need to also support "from asyncio import sleep" cases, handle
ast.Name(func) separately but only after resolving imports/aliases to confirm it
comes from asyncio.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 880b9a68-fb05-40ea-84f0-4439d5e95e69

📥 Commits

Reviewing files that changed from the base of the PR and between d59ff5e and 81937cb.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (16)

.env.example
check_tavily.py
pyproject.toml
rag.py
schema/qa_cache_lookup.surql
src/config/settings.py
src/core/interfaces.py
src/embeddings/github_embeddings.py
src/llm/github_llm.py
src/rag/engine.py
src/rag/reflexion_engine.py
src/utils/input_handler.py
src/utils/preflight.py
src/vectorstore/surrealdb_store.py
src/websearch/tavily_search.py
tests/test_parallelism.py

🚧 Files skipped from review as they are similar to previous changes (1)

schema/qa_cache_lookup.surql

coderabbitai · 2026-03-15T11:12:11Z

+        key = settings.tavily_api_key
+        if key:
+            print("  OK   : settings.tavily_api_key loaded")
+            return True
+        else:
+            print("  ERROR: settings.tavily_api_key is empty.")
+            return False


⚠️ Potential issue | 🟡 Minor

Apply placeholder filtering in settings check as well.

On Line 51, check_settings() treats any non-empty value as valid, including your_tavily_api_key_here, so this step can report a false positive.

Suggested fix

- if key: + if key and key != "your_tavily_api_key_here": print(" OK : settings.tavily_api_key loaded") return True else: print(" ERROR: settings.tavily_api_key is empty.") return False

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@check_tavily.py` around lines 50 - 56, The check_settings() block currently accepts any non-empty settings.tavily_api_key (causing false positives for placeholder values like "your_tavily_api_key_here"); update check_settings() to trim whitespace and reject known placeholder patterns (e.g., exact matches like "your_tavily_api_key_here" or values that start with "your_" or contain "example" or "placeholder") before returning True; when rejecting, print the existing error message (or a slightly more specific one) and return False so only real API keys pass.

coderabbitai · 2026-03-15T11:12:11Z

+    qa_cache_enabled: bool = Field(default=True)
+    qa_cache_similarity_threshold: float = Field(default=0.8)


⚠️ Potential issue | 🟡 Minor

Constrain QA cache similarity threshold range.

On Line 84, qa_cache_similarity_threshold accepts any float. Values outside [0.0, 1.0] can silently break cache hit behavior.

Suggested fix

- qa_cache_similarity_threshold: float = Field(default=0.8) + qa_cache_similarity_threshold: float = Field(default=0.8, ge=0.0, le=1.0)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

qa_cache_enabled: bool = Field(default=True)

qa_cache_similarity_threshold: float = Field(default=0.8)

qa_cache_enabled: bool = Field(default=True)

qa_cache_similarity_threshold: float = Field(default=0.8, ge=0.0, le=1.0)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/config/settings.py` around lines 83 - 84, The qa_cache_similarity_threshold Field currently accepts any float and should be constrained to [0.0, 1.0]; update the Field definition for qa_cache_similarity_threshold in settings.py (the qa_cache_similarity_threshold symbol) to enforce bounds (e.g., use Field(default=0.8, ge=0.0, le=1.0) or an equivalent pydantic constrained type) so values outside 0–1 are rejected/validated.

coderabbitai · 2026-03-15T11:12:11Z

+    def _get_max_tokens_param(self, max_tokens_value: int) -> dict:
+        """Get the appropriate max tokens parameter for the model"""
+        if self._use_completion_tokens:
+            # Azure AI Inference SDK doesn't support max_completion_tokens
+            # For newer models, we omit the parameter and let the model use its default
+            return {}
+        return {"max_tokens": max_tokens_value}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For the azure-ai-inference Python SDK version used here, what is the supported way to cap output tokens for ChatCompletionsClient when a model requires completion-token style limits (e.g., max_completion_tokens)?

💡 Result:

In the azure-ai-inference Python SDK (e.g., ChatCompletionsClient in 1.0.0b9), the supported way is to pass model-specific fields via model_extras, and instruct the service to forward unknown fields by setting the extra-parameters: pass-through header. The SDK documents that model_extras is injected “as-is” at the root of the JSON body and that handling depends on the extra-parameters request header. (azuresdkdocs.z19.web.core.windows.net) The REST API docs describe extra-parameters: pass-through for forwarding such parameters. (learn.microsoft.com)

resp = client.complete( messages=[...], # don't use max_tokens for these models model_extras={"max_completion_tokens": 512}, headers={"extra-parameters": "pass-through"}, )

Citations:

1: https://azuresdkdocs.z19.web.core.windows.net/python/azure-ai-inference/1.0.0b9/azure.ai.inference.html

2: https://learn.microsoft.com/sv-se/rest/api/aifoundry/modelinference/

🏁 Script executed:

# Find the SDK version being used find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "poetry.lock" | head -20

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 85

🏁 Script executed:

# Search for azure-ai-inference version specification rg "azure-ai-inference" --type py --type txt

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 51

🏁 Script executed:

# Find all usages of _get_max_tokens_param rg "_get_max_tokens_param" -B 2 -A 5

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 1132

🏁 Script executed:

cat pyproject.toml

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 778

🏁 Script executed:

# Get broader context around the params.update call rg "params.update.*_get_max_tokens_param" -B 20 -A 10 src/llm/github_llm.py

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 1197

🏁 Script executed:

# Find where params are used (likely in a complete() or similar call) rg "def.*complete|def.*chat" -A 30 src/llm/github_llm.py | head -80

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 1319

🏁 Script executed:

# Check if complete() call shows any header parameter usage in the file rg "complete\(" -A 5 src/llm/github_llm.py | head -30

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 563

🏁 Script executed:

# Search for any header-related code in the file rg "header|Header" src/llm/github_llm.py

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 51

Update the method to support max_completion_tokens for models that require it, or document why limits are being dropped.

The azure-ai-inference SDK (1.0.0b9+) does support max_completion_tokens—contrary to the inline comment—by passing model_extras={"max_completion_tokens": value} with the extra-parameters: pass-through header. Silently returning {} on line 88 discards the caller's token limit entirely, risking unexpectedly long and costly completions. Either implement the proper limit using the supported mechanism, or explicitly document and justify why newer models use unbounded output.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/llm/github_llm.py` around lines 83 - 89, The method _get_max_tokens_param currently drops token limits when self._use_completion_tokens is True; instead, return the Azure pass-through form so limits are honored: when self._use_completion_tokens is True return a dict that supplies model_extras with the key "max_completion_tokens" set to max_tokens_value (e.g. {"model_extras": {"max_completion_tokens": max_tokens_value}}), and update the inline comment to remove the incorrect claim; keep the existing {"max_tokens": max_tokens_value} return for the non-completion branch and ensure any callers that set headers include the required "extra-parameters: pass-through" behavior elsewhere in the codebase.

coderabbitai · 2026-03-15T11:12:11Z

+                response = self.client.complete(**params)  # type: ignore[attr-defined]
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python - <<'PY' import inspect try: from azure.ai.inference import ChatCompletionsClient print("sync complete signature:", inspect.signature(ChatCompletionsClient.complete)) except Exception as e: print("sync client import error:", e) try: from azure.ai.inference.aio import ChatCompletionsClient as AioChatCompletionsClient print("aio client available: yes") print("aio complete signature:", inspect.signature(AioChatCompletionsClient.complete)) print("aio complete is coroutine:", inspect.iscoroutinefunction(AioChatCompletionsClient.complete)) except Exception as e: print("aio client available: no") print("aio import error:", e) PY

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 185

🏁 Script executed:

# Check dependencies find . -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" | head -5

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 85

🏁 Script executed:

# Examine the file in question cat -n src/llm/github_llm.py | sed -n '120,210p'

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 4551

🏁 Script executed:

# Check for any async client usage patterns elsewhere rg "aio|async.*client|ChatCompletionsClient" --type py

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 214

🏁 Script executed:

cat pyproject.toml | grep -A 20 -B 5 "azure"

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 640

🏁 Script executed:

# Also check if there's any async usage pattern elsewhere in the codebase rg "async def|await" src/llm/ --type py -B 2 -A 2

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 633

🌐 Web query:

azure-ai-inference 1.0.0 async ChatCompletionsClient aio documentation

💡 Result:

For async chat completions in azure-ai-inference, use the azure.ai.inference.aio.ChatCompletionsClient (the async client lives under the aio namespace). The SDK docs are versioned; the current published docs I can find are for azure-ai-inference 1.0.0b9 (beta), not a GA 1.0.0. [1]

Async client API docs (aio):

azure.ai.inference.aio.ChatCompletionsClient (constructor, async complete(...), async close(), etc.) [2]

PyPI “async” notes (install + import path):

Install aiohttp, then import from azure.ai.inference.aio instead of azure.ai.inference. [1]

Minimal async usage (from the documented pattern):

import asyncio from azure.ai.inference.aio import ChatCompletionsClient from azure.ai.inference.models import SystemMessage, UserMessage from azure.core.credentials import AzureKeyCredential async def main(): client = ChatCompletionsClient(endpoint=endpoint, credential=AzureKeyCredential(key)) try: resp = await client.complete(messages=[ SystemMessage("You are a helpful assistant."), UserMessage("How many feet are in a mile?") ]) print(resp.choices[0].message.content) finally: await client.close() asyncio.run(main())

[1][2]

Sources

azure-ai-inference PyPI README (async client + aiohttp + azure.ai.inference.aio import). [1]

Azure SDK for Python API reference: azure.ai.inference.aio.ChatCompletionsClient (async complete, etc.). [2]

🏁 Script executed:

# Check if there are any other LLM clients in the codebase that use async patterns correctly rg "async for|await.*complete|aio" --type py src/

Repository: CLoaKY233/Multi-cycle-RAG

Length of output: 1221

Replace sync client with async client to avoid blocking the event loop.

Lines 130 and 202 execute synchronous client.complete() calls inside async def methods. The azure-ai-inference SDK provides an async variant at azure.ai.inference.aio.ChatCompletionsClient with an async complete() method that must be awaited. The current sync call will block the event loop under concurrent load, degrading performance when multiple streams are active (confirmed usage pattern in src/reflexion/evaluator.py and src/rag/reflexion_engine.py).

Replace with:

from azure.ai.inference.aio import ChatCompletionsClient # In __init__: self.client = ChatCompletionsClient(...) # At lines 130, 202: response = await self.client.complete(**params)

Also applies to: 202-203

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/llm/github_llm.py` around lines 130 - 131, The current synchronous client.complete calls block the event loop; switch to the async client by importing azure.ai.inference.aio.ChatCompletionsClient, instantiate it in the class __init__ (assign to self.client), and replace synchronous calls to self.client.complete(...) inside async methods with awaited calls (await self.client.complete(...)); ensure the client is closed/async-disposed where appropriate and update any initialization/cleanup logic used by the class that creates the client.

coderabbitai · 2026-03-15T11:12:11Z

+            # Store in QA semantic cache for future similar questions
+            if settings.qa_cache_enabled and final_answer:
+                try:
+                    # Reuse the embedding computed before the reflexion loop
+                    if question_embedding is None:
+                        question_embedding = (
+                            await self.vector_store.embedding_function.embed_text(
+                                question
+                            )
+                        )
+
+                    await self.vector_store.store_qa_cache(
+                        question=question,
+                        question_embedding=question_embedding,
+                        answer=final_answer,
+                        metadata={
+                            "total_cycles": len(reflexion_memory.cycles),
+                            "total_processing_time": reflexion_memory.total_processing_time,
+                            "final_confidence": reflexion_memory.cycles[
+                                -1
+                            ].evaluation.confidence_score
+                            if reflexion_memory.cycles
+                            else 0.0,
+                            "web_search_used": any(
+                                cycle.web_search_enabled
+                                for cycle in reflexion_memory.cycles
+                            ),
+                        },
+                    )
+                    logger.info("Stored QA cache entry for future similar questions")
+                except Exception as e:
+                    logger.error("Error storing QA cache", error=str(e))


⚠️ Potential issue | 🟡 Minor

Potential issue: question_embedding could be None when storing QA cache.

If settings.qa_cache_enabled is True but the initial embedding fails (line 152 sets question_embedding = None), the code at line 599 regenerates the embedding. However, if this second embedding call also fails, store_qa_cache would be called with a potentially invalid embedding.

Consider wrapping the entire QA cache storage in a try-except or checking the embedding result:

🛡️ Suggested improvement

if settings.qa_cache_enabled and final_answer: try: # Reuse the embedding computed before the reflexion loop if question_embedding is None: question_embedding = ( await self.vector_store.embedding_function.embed_text( question ) ) + + if not question_embedding: + logger.warning("Failed to generate question embedding for QA cache") + raise ValueError("Empty embedding") await self.vector_store.store_qa_cache(

🧰 Tools

🪛 Ruff (0.15.5)

[warning] 625-625: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/rag/reflexion_engine.py` around lines 595 - 626, The QA cache storage path can call store_qa_cache with a None embedding if embed_text fails twice; update the block guarded by settings.qa_cache_enabled / final_answer to ensure a valid question_embedding before calling self.vector_store.store_qa_cache (e.g., attempt embedding via self.vector_store.embedding_function.embed_text and if it returns None or raises, skip storing and log a warning), or wrap the entire store_qa_cache call in a try/except that validates question_embedding is not None before invoking store_qa_cache (refer to symbols question_embedding, self.vector_store.embedding_function.embed_text, self.vector_store.store_qa_cache, reflexion_memory, and logger) and include the actual exception details in logger.error when storage or embedding fails.

coderabbitai · 2026-03-15T11:12:11Z

+                    if line.strip().lower() == "\\e":
+                        console.print("[dim]Multi-line input cancelled.[/dim]")
+                        # Fall back to normal single-line input
+                        return self.get_input(prompt)


⚠️ Potential issue | 🟠 Major

Avoid recursive re-entry from multiline cancel paths.

On Line 115 and Line 128, calling self.get_input(prompt) from inside _get_multiline_input() can build unbounded recursion during repeated cancel cycles.

Suggested fix

- if raw_input.strip().lower() == "\\m": - # Enter multiline mode - return self._get_multiline_input(prompt) + if raw_input.strip().lower() == "\\m": + # Enter multiline mode + multiline_result = self._get_multiline_input(prompt) + if multiline_result is None: + continue + return multiline_result

- # Fall back to normal single-line input - return self.get_input(prompt) + # Cancel multiline and return control to caller loop + return None ... - return self.get_input(prompt) + return None

Also applies to: 126-128

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/input_handler.py` around lines 112 - 115, The multiline cancel path in _get_multiline_input currently re-enters get_input(prompt) recursively, which can cause unbounded recursion; change the cancel behavior so _get_multiline_input returns a sentinel (e.g., None or empty string) instead of calling self.get_input, and update get_input to handle that sentinel by iterating/re-prompting in a loop (or repeating the input flow) rather than calling itself recursively. Specifically modify _get_multiline_input (the branch that checks for "\\e") to return a value indicating "cancelled" and adjust get_input to detect that return, continue its prompt loop, and present the single-line prompt again without recursion.

coderabbitai · 2026-03-15T11:12:11Z

+        value = getattr(settings, attr_name, None)
+
+        # Check if value is present and not a placeholder
+        is_present = False
+        source = "missing"
+
+        if value:
+            # For GitHub token, ghp_ prefix is valid
+            if attr_name == "github_token" and value.startswith(
+                ("ghp_", "github_pat_")
+            ):
+                is_present = True
+                source = "env"
+            elif not value.lower().startswith("your_"):
+                is_present = True
+                source = "env"
+


⚠️ Potential issue | 🟡 Minor

Trim credential values before presence checks.

On Line 53, whitespace-only strings (e.g. " ") are treated as present, which can produce false “ready” status for required credentials.

Suggested fix

- value = getattr(settings, attr_name, None) + raw_value = getattr(settings, attr_name, None) + value = raw_value.strip() if isinstance(raw_value, str) else raw_value ... - if value: + if isinstance(value, str) and value:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/preflight.py` around lines 47 - 63, The presence check treats whitespace-only credential strings as valid; update the logic in the preflight check (where value = getattr(settings, attr_name, None) and the subsequent is_present/source determination) to trim string values first (e.g., apply .strip() to value when it's a str) and then perform the github_token prefix checks and the "your_" placeholder check against the trimmed value so that strings composed only of whitespace are considered missing.

coderabbitai · 2026-03-15T11:12:11Z

+        try:
+            # Search for similar questions using cosine similarity
+            query = f"""
+                SELECT id, question, answer, metadata,
+                       vector::similarity::cosine(question_embedding, {query_embedding}) AS score
+                FROM qa_history
+                WHERE vector::similarity::cosine(question_embedding, {query_embedding}) >= {threshold}
+                ORDER BY score DESC
+                LIMIT 1;
+            """


⚠️ Potential issue | 🟡 Minor

SQL injection vector through string interpolation.

The static analysis flagged S608 here. While query_embedding is a List[float] and threshold is a float (mitigating practical injection risk), directly interpolating Python values into query strings is a code smell and could become problematic if types aren't validated upstream.

Consider using SurrealDB's parameterized query syntax if supported, or add explicit type validation:

🛡️ Defensive validation approach

async def lookup_qa_cache( self, query_embedding: List[float], threshold: float = 0.85, ) -> Dict[str, Any] | None: """Lookup similar question in cache by embedding similarity""" await self._ensure_connection() if not self.client: raise VectorStoreException("Client not connected") + # Validate types to prevent injection + if not isinstance(threshold, (int, float)) or not (0.0 <= threshold <= 1.0): + raise ValueError("threshold must be a float between 0.0 and 1.0") + if not isinstance(query_embedding, list) or not all(isinstance(x, (int, float)) for x in query_embedding): + raise ValueError("query_embedding must be a list of floats") + try: # Search for similar questions using cosine similarity query = f"""

🧰 Tools

🪛 Ruff (0.15.5)

[error] 480-487: Possible SQL injection vector through string-based query construction

(S608)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/vectorstore/surrealdb_store.py` around lines 478 - 487, The SQL built into the query variable currently interpolates Python values directly (query_embedding and threshold) which risks injection; change this to a parameterized SurrealDB query using placeholders (e.g. $query_embedding and $threshold) instead of f-string interpolation and pass a validated params dict when executing, or—if SurrealDB parameterization isn't available—perform strict defensive validation/conversion: ensure query_embedding is explicitly converted to a List[float] and threshold to a float before embedding, and only then format a safe, canonical representation; update the call that executes this query to use the new params and reference the qa_history table and vector::similarity::cosine usage accordingly.

coderabbitai · 2026-03-15T11:12:11Z

+            word_count = len(full_content.split()) if full_content else 0
+
+            min_length = settings.web_search_min_content_length
+            if word_count > 0 and len(full_content) < min_length:
+                status = WebSearchStatus.LOW_QUALITY
+            else:
+                status = WebSearchStatus.SUCCESS


⚠️ Potential issue | 🟡 Minor

Edge case: word_count is zero when full_content is falsy, but status defaults to SUCCESS.

When full_content is empty/None, word_count will be 0, but the condition word_count > 0 and len(full_content) < min_length is False, so status defaults to SUCCESS. An empty result marked as SUCCESS may be misleading.

♻️ Suggested fix

word_count = len(full_content.split()) if full_content else 0 min_length = settings.web_search_min_content_length - if word_count > 0 and len(full_content) < min_length: + if not full_content or word_count == 0: + status = WebSearchStatus.LOW_QUALITY + elif len(full_content) < min_length: status = WebSearchStatus.LOW_QUALITY else: status = WebSearchStatus.SUCCESS

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/websearch/tavily_search.py` around lines 95 - 101, The current logic can mark empty or None full_content as SUCCESS; update the branching around word_count/full_content so empty content is explicitly handled: if full_content is falsy or word_count == 0 set status to WebSearchStatus.NO_CONTENT (or WebSearchStatus.LOW_QUALITY if NO_CONTENT doesn't exist), else compute min_length = settings.web_search_min_content_length and if len(full_content) < min_length set status = WebSearchStatus.LOW_QUALITY, otherwise set status = WebSearchStatus.SUCCESS; adjust the block around variables full_content, word_count, min_length, and the WebSearchStatus assignments accordingly.

Migrate web search to Tavily and refactor imports

d59ff5e

CLoaKY233 self-assigned this Mar 15, 2026

Copilot AI review requested due to automatic review settings March 15, 2026 10:01

Copilot started reviewing on behalf of CLoaKY233 March 15, 2026 10:02 View session

This comment was marked as resolved.

Sign in to view

CLoaKY233 added 3 commits March 15, 2026 15:43

Remove factory scaffolding and docs

bc4c586

Refactor Tavily search and remove dev deps

2c8e3ed

Add Web search cache purge and QA cache clear

81937cb

coderabbitai Bot reviewed Mar 15, 2026

View reviewed changes

		qa_cache_enabled: bool = Field(default=True)
		qa_cache_similarity_threshold: float = Field(default=0.8)

		response = self.client.complete(**params) # type: ignore[attr-defined]

Conversation

CLoaKY233 commented Mar 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLoaKY233 commented Mar 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 15, 2026 •

edited

Loading