Handle Overflow for big patch RPM#274
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
db91c55 to
81df18e
Compare
|
/test vulnerability-analysis-on-pr |
Signed-off-by: Zvi Grinberg <zgrinber@redhat.com>
zvigrinberg
left a comment
There was a problem hiding this comment.
Hi @RedTanny ,
Good job in overall, only 1 issue left , regarding performance and efficiency.
PTAL.
Thanks!.
| # Has actual content - double loop over patch chunks and tool output chunks | ||
| tool_chunks = truncate_tool_output_list(tool_output_for_llm, tool_used, max_tokens=1000) | ||
| all_findings: list[str] = [] | ||
| best_tool_outcome = "" | ||
|
|
||
| for chunk in chunks: | ||
| comp_prompt = L1_COMPREHENSION_PROMPT.format( | ||
| vuln_id=vuln_id, | ||
| target_package=target_package_name, | ||
| vulnerability_intel=intel_formatted, | ||
| raw_patch_diff=raw_patch_diff, | ||
| tool_used=tool_used, | ||
| tool_input=tool_input_detail, | ||
| last_thought=last_thought_text, | ||
| tool_output=chunk, | ||
| ) | ||
| chunk_findings = await invoke_comprehension( | ||
| structured_comprehension_llm, | ||
| comp_prompt, | ||
| tool_used, | ||
| tool_input_detail, | ||
| chunk, | ||
| agent_label="L1", | ||
| ) | ||
| all_findings.extend(chunk_findings.findings) | ||
| # Keep tool_outcome from chunk with actual findings (not FAILED) | ||
| if not best_tool_outcome or ( | ||
| chunk_findings.findings and | ||
| not any("FAILED" in f for f in chunk_findings.findings) | ||
| ): | ||
| best_tool_outcome = chunk_findings.tool_outcome | ||
| for patch_chunk in patch_diff_chunks: | ||
| for tool_chunk in tool_chunks: | ||
| logger.debug( | ||
| "Comprehension token breakdown: " | ||
| "intel=%d, patch_chunk=%d, tool_chunk=%d, last_thought=%d, " | ||
| "tool_input=%d, total_parts=%d", | ||
| count_tokens(intel_formatted), | ||
| count_tokens(patch_chunk), | ||
| count_tokens(tool_chunk), |
There was a problem hiding this comment.
High: Double loop (patch_chunks × tool_chunks) causes quadratic LLM invocations and duplicate/contradictory findings
This nested loop calls the comprehension LLM for every combination of patch chunk and tool output chunk. With MAX_PATCH_CHUNKS=2 and 3 tool chunks, that's 6 LLM calls instead of the previous 3. Each call sees a different patch slice paired with the same tool output, which produces:
- Duplicate findings: The same grep match analyzed against overlapping patch context yields the same finding twice
- Contradictory verdicts: One patch chunk shows the fix, another doesn't — the LLM may conclude PATCHED from one and NOT_PATCHED from the other
- Increased cost: 2× LLM calls with no deduplication
The patch diff serves as background context (what changed), while the tool output is the primary input (what the grep found). Cross-producting them doesn't add value.
Suggestion: Keep a single truncated patch string, loop only over tool chunks (preserving the original single-loop structure):
# Simplify get_relevant_hunks back to returning a single truncated string:
def get_relevant_hunks(parsed_patch, grep_query, max_tokens=MAX_PATCH_TOKENS) -> str:
# ... same file matching logic ...
full_diff = "\n".join(all_file_diffs)
if count_tokens(full_diff) <= max_tokens:
return full_diff
return _truncate_diff_by_tokens(full_diff, max_tokens)
# Then in observation_node, keep the original single loop:
raw_patch_diff = ""
if tool_used == "Source Grep" and parsed_patch:
raw_patch_diff = get_relevant_hunks(parsed_patch, tool_input_detail)
tool_chunks = truncate_tool_output_list(tool_output_for_llm, tool_used, max_tokens=1000)
for tool_chunk in tool_chunks:
comp_prompt = L1_COMPREHENSION_PROMPT.format(
...
raw_patch_diff=raw_patch_diff,
tool_output=tool_chunk,
)
# ... invoke_comprehension ...This solves the token overflow (patch is truncated to fit) while avoiding the quadratic blowup and duplicate findings. The LLM gets a truncated but coherent patch context rather than seeing partial slices cross-multiplied with every tool chunk.
BugFix observation fail send prompt pass the 8k limit when patch diff is big