CCA pre-filters (Java import, C arg-count, Go sub-package), Java CCA cycle detection & perf, RPM version guard, ~40 bug fixes, ~200 new tests by tmihalac · Pull Request #261 · RHEcosystemAppEng/exploit-iq-agent

tmihalac · 2026-06-23T14:46:24Z

Java CCA

Add _can_reference_class() import-based pre-filter (simple class name, wildcard import, same package, same artifact) to eliminate irrelevant uber-JAR candidates before expensive type resolution
Only filter third-party candidates; application code (root docs) always passes to avoid false negatives from polymorphic interface calls
Add DFS cycle detection guard in get_relevant_documents to prevent infinite loops from self-recursive or mutually recursive method calls
Switch logger.debug in __check_identifier_resolved_to_callee_function_package from f-strings to %s lazy formatting

C CCA

Add argument-count pre-filter to C parser's search_for_called_function to reject cross-package false positives where a same-named function has different arity (e.g. rsync read_byte(1) vs PostgreSQL read_byte(3))

Go CCA

Add Go sub-package awareness to Function Locator (_extract_go_subpackage, _go_subpackage_flow_control)
Add CCA sub-package filtering via resolve_subpackage_to_module in Go parser and tree fallback in chain_of_calls_retriever
Add Go sub-package enrichment from patches in intel_utils (extract_go_subpackages_from_patch)
Add Go sub-package prefix matching to Rule 8 so github.com/lib/foo/bar matches target github.com/lib/foo
Fix Go FL short_name dict collision: store list of packages per short name instead of overwriting

CCA / all ecosystems

Pre-index documents in chain_of_calls_retriever.__init__ (sort_docs, _root_docs, _source_path_index) for O(1) lookups in get_possible_docs instead of scanning all documents
Optimize _is_doc_excluded to compare source path (cheap) before page content (expensive)
Deduplicate parents list in non-Java CCA tree_dict to prevent duplicate entries from dependency tree builder
Add distinct "function not found" message when CCA returns empty call_hierarchy_list so agent distinguishes missing function from unreachable function
Escape regex metacharacters in Function Caller Finder query builder to handle identifiers containing dots, brackets, and plus sign
Reorder direct_parents in __find_caller_function_dfs so root-level packages are searched before library packages, fixing nondeterministic JS transitive search timeouts

JS parser fixes

Cap get_function_name regex searches to first 2000 chars to prevent catastrophic backtracking on huge functions
Fix is_comment_line to handle *-prefixed JSDoc continuation lines vs generator *method() syntax
Fix _extract_class_name regex to support $ in JS identifiers (\w+ → [\w$]+)
Fix resolve_chain backreference [^\1] in string pattern (changed to .*?)
Remove unused is_multiline parameter from _parse_declarations

RPM checker

Emit TARGET_IN_VULNERABLE_RANGE in VulnerabilityIntel.format_for_prompt() so the L1 agent sees the field referenced by the version-based fallback rules
Add VERSION GUARD clause to Case B sys prompt: when target is in vulnerable range, a grep match alone is not sufficient to conclude PATCHED — the fix must be at the exact CVE location
Shorten cve_verify_vuln_package LLM response instruction to one sentence

Config scanner

Add _CONFIG_DIR_ALLOWED_EXTENSIONS allowlist for directory-matched files, filtering out .js/.css/.map/.java/.py from config collection
Add build/tool config patterns: pyproject.toml, setup.cfg, tox.ini, tsconfig.json, .eslintrc.json, Makefile, CMakeLists.txt, meson.build

Source code bug fixes (~40)

Fix dep_tree.py missing comma causing --pythonvenv_python string concatenation
Fix dep_tree.py C/C++ detect_ecosystem walk not using _WALK_EXCLUDE_DIRS
Fix dep_tree.py remove unused _ensure_venv method
Fix dep_tree.py deduplicate types- package candidates via dict.fromkeys
Fix c_segmenter_custom.py remove_comments stripping patterns inside string literals
Fix c_lang_function_parsers.py debug print statement left in production code
Fix golang_functions_parsers.py len(declaration_parts) == (2 or 3) always-true comparison
Fix golang_functions_parsers.py no-op re.search("") call
Fix golang_functions_parsers.py is_package_imported raw string split, missing quote stripping, and unescaped identifier in regex
Fix golang_functions_parsers.py is_same_package crash on empty input
Fix python_functions_parser.py is_same_package returning True for two empty strings
Fix python_segmenters_with_classes_methods.py annotating all methods with last class name and skipping async def methods
Fix source_code_git_loader.py safe.directory guard unnecessarily gated on clone_url
Fix brew_downloader.py returning path with zero downloads and extracting SRPM from cache path instead of target path
Fix configuration_scanner.py re.match allowing partial filename matches (fullmatch), max_results=0 returning 1 result, cache race condition on concurrent eviction, and missing docker-compose*.yaml pattern
Fix import_usage_analyzer.py empty short_name matching everything
Fix async_http_utils.py off-by-one in retry count, consumer errors caught by retry loop, retry_on_client_errors overridden by Retry-After check, negative sleep from past X-RateLimit-Reset, and missing @functools.wraps
Fix function_name_locator.py python_flow_control crash on non-function documents, Go versioned module short-name collision, and get_function_name ValueError exception handling
Fix git_commit_searcher.py _rank_results mutating confidence in-place
Fix git_repo_manager.py double-wrapping GitCommandError
Fix intel_utils.py parse_cpe checking split_cpe[5] instead of split_cpe[10] for system
Fix llm_engine_utils.py assert False in production code replaced with RuntimeError
Fix repo_resolver.py case-sensitivity bug in normalize_package_name dropping original case for mixed-case JSON keys
Fix serp_api_wrapper.py key index not reset after full rotation, dead max_retries field, and callers passing removed field
Fix csaf_generator.py GHSA description dropped when no pre-existing note and notes appended with text: None
Fix web_patch_fetcher.py missing asyncio import, _is_commit_url false positive on /c/ outside kernel.org, dropping Gitiles commit URLs, yarl double-encoding %5E%21, and rewrite _fetch_gitiles_patch from sync requests to async aiohttp
Fix prompting.py build_tool_descriptions missing FL, CONFIG, IUA, GREP entries
Fix java_functions_parsers.py _count_call_args treating < comparison as generic bracket

Build & infra

Parallelize source JAR extraction using ThreadPoolExecutor with cgroup-aware worker count
Add -Dmaven.artifact.threads=10 to all mvn dependency:copy-dependencies invocations in dep_tree.py
Set MAVEN_OPTS with -Dmaven.repo.local on shared PVC in on-cm-runner.yaml, on-pull-request.yaml, and exploit_iq_service.yaml
Add GOCACHE env var to on-cm-runner.yaml, on-pull-request.yaml, exploit_iq_service.yaml
Increase batch runner CPU request/limit from 1/2 cores to 3/3 cores

Tests

Fix tautological assertions (disjunctive or, truthiness-only, conditional if-then-assert) and tests that reimplemented source logic instead of calling real functions
Consolidate 9 duplicate in-tree test files into canonical tests/ location
Add ~200 new tests: Java CCA (_can_reference_class, function_called_from_caller_body, __find_caller_function), JS parser/segmenter (backtracking cap, comment line, $ identifiers), C parser (argument-count filter, find_top_level_blocks), Go sub-package (FL, CCA, intel enrichment), RPM checker (format_for_prompt, VERSION GUARD), config scanner (allowlist, build patterns), and tools (FL, IUA, SerpAPI, git, async HTTP, web patch fetcher)

vbelouso · 2026-06-23T14:47:18Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scan Engine	Critical	High	Medium	Low	Total (0)
✅	Code Security	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

tmihalac · 2026-06-23T15:18:46Z

/test-heavy

tmihalac · 2026-06-24T12:11:19Z

/test-heavy

tmihalac · 2026-06-24T18:01:10Z

/test-heavy

tmihalac · 2026-06-25T05:40:00Z

/test-heavy

tmihalac · 2026-06-26T14:22:19Z

/test vulnerability-analysis-on-pr

tmihalac · 2026-06-26T17:51:37Z

/test-heavy

tmihalac · 2026-06-28T06:25:20Z

/test vulnerability-analysis-on-pr

tmihalac · 2026-06-28T07:16:03Z

/test vulnerability-analysis-on-pr

tmihalac · 2026-06-28T07:44:31Z

/test vulnerability-analysis-on-pr

tmihalac · 2026-06-28T08:12:48Z

/test-heavy

- Replaced esprima-based JavaScript segmenter with tree-sitter for reliable parsing of modern JS syntax (optional chaining, nullish coalescing, top-level await) - Fixed JS function name extraction: keyword filtering, position-aware matching, redundant pattern removal, generator/TypeScript/anonymous-export support - Added build-artifact filtering (should_skip) that excludes app-level dist/, build/static/, .min.js while preserving node_modules/*/dist/ as legitimate third-party source - Added empty-name guards in CCA BFS to prevent documents with unextractable function names from entering call-chain analysis - Fixed _get_function_calls regex to detect calls through optional chaining (obj?.method()) Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

formatting - Add _can_reference_class() to JavaChainOfCallsRetriever for 4-way import visibility check (simple class name, wildcard import, same package, same artifact) - Apply import pre-filter in _get_possible_docs via optional declaring_fqcn/callee_file_name/code_documents params - Pass declaring FQCN from __find_caller_function to _get_possible_docs to eliminate irrelevant uber-JAR candidates before expensive type resolution - Only filter third-party candidates; application code (root docs) always passes to avoid false negatives from polymorphic interface calls - Add DFS cycle detection guard in get_relevant_documents to prevent infinite loops from self-recursive or mutually recursive method calls - Switch logger.debug in __check_identifier_resolved_to_callee_function_package from f-strings to %s lazy formatting to avoid string construction when debug logging is disabled Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

escaping, and CCA empty-result guidance - Deduplicate parents list in non-Java CCA tree_dict to prevent duplicate entries from dependency tree builder - Add Go subpackage prefix matching to Rule 8 so "github.com/lib/foo/bar" matches target "github.com/lib/foo" - Add distinct "function not found" message when CCA returns empty call_hierarchy_list so agent distinguishes missing function from unreachable function - Escape regex metacharacters in Function Caller Finder query builder to handle identifiers containing dots, brackets, and plus sign Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

- Fix dep_tree.py missing comma causing "--pythonvenv_python" string concatenation - Fix dep_tree.py C/C++ detect_ecosystem walk not using _WALK_EXCLUDE_DIRS - Fix c_segmenter_custom.py remove_comments stripping patterns inside string literals - Fix c_lang_function_parsers.py debug print statement left in production code - Fix golang_functions_parsers.py `len(declaration_parts) == (2 or 3)` always-true comparison - Fix golang_functions_parsers.py no-op `re.search("")` call - Fix golang_functions_parsers.py is_package_imported raw string split and missing quote stripping - Fix golang_functions_parsers.py is_same_package crash on empty input - Fix javascript_functions_parser.py is_comment_line missing block comment continuation (`*`) - Fix javascript_functions_parser.py _extract_class_name regex missing `$` in identifier class - Fix javascript_functions_parser.py _parse_declarations unused is_multiline parameter - Fix javascript_functions_parser.py backreference `[^\1]` in string pattern - Fix python_functions_parser.py is_same_package returning True for two empty strings - Fix python_segmenters_with_classes_methods.py annotating all methods with last class name - Fix python_segmenters_with_classes_methods.py skipping async def methods - Fix source_code_git_loader.py safe.directory guard unnecessarily gated on clone_url - Fix brew_downloader.py returning path even with zero downloads - Fix brew_downloader.py extracting SRPM from cache path instead of target path - Fix configuration_scanner.py re.match allowing partial filename matches - Fix configuration_scanner.py max_results=0 returning 1 result - Fix configuration_scanner.py cache race condition on repo_key check outside lock - Fix configuration_scanner.py missing docker-compose*.yaml pattern - Fix import_usage_analyzer.py empty short_name matching everything - Fix async_http_utils.py off-by-one in retry count (`<=` vs `<`) - Fix async_http_utils.py consumer errors caught by retry loop instead of propagating - Fix async_http_utils.py retry_on_client_errors overridden by Retry-After check - Fix async_http_utils.py negative sleep from X-RateLimit-Reset in the past - Fix async_http_utils.py missing @functools.wraps on retry_async wrapper - Fix function_name_locator.py python_flow_control crash on non-function documents - Fix function_name_locator.py Go versioned module short-name collision (v2, v3) - Fix git_commit_searcher.py _rank_results mutating confidence in-place - Fix git_repo_manager.py double-wrapping GitCommandError - Fix intel_utils.py parse_cpe checking split_cpe[5] instead of split_cpe[10] for system - Fix llm_engine_utils.py assert False in production code replaced with RuntimeError - Fix repo_resolver.py case-sensitivity bug in normalize_package_name - Fix serp_api_wrapper.py key index not reset after full rotation - Fix serp_api_wrapper.py dead max_retries field - Fix csaf_generator.py GHSA description dropped when no pre-existing note - Fix csaf_generator.py notes appended with text: None when summary/justification missing - Fix web_patch_fetcher.py missing asyncio import for TimeoutError catch - Fix web_patch_fetcher.py _is_commit_url false positive on /c/ outside kernel.org - Fix web_patch_fetcher.py dropping Gitiles commit URLs from candidates - Fix prompting.py build_tool_descriptions missing FL, CONFIG, IUA, GREP entries Test correctness fixes: - Replace tautological assertions (disjunctive or, truthiness-only, conditional if-then-assert) - Rewrite tests that reimplemented source logic instead of calling real functions - Fix mock searcher ignoring tantivy query parameter in IUA tests - Fix test_stub_only_triggers_pypi_fetch swallowing all exceptions via try/except pass - Fix test_clone_failure_cleans_temp_dir vacuously-passing assertion - Fix test_consumer_error_propagates using overly broad pytest.raises(Exception) - Fix test_optional_chaining_preservation asserting on input string not parsed output - Fix test_remove_comments_string_literal wrong docstring and tautological assertion - Fix test_key_rotation not verifying actual key sent in HTTP request - Fix test_all_tools_produce_7_descriptions omitting FL, CONFIG, IUA, GREP - Fix conditional assertion in git_commit_searcher silently passing on None - Fix test_third_party_docs weak assertion not verifying actual jar key - Fix test_llm_engine_utils disjunctive or assertion masking wrong return value Agent/pipeline coverage: - Add pre_process_node tests for ReachabilityAgent and CodeUnderstandingAgent - Add _postprocess_results exception handling tests - Add dispatch_question exception fallback and build_routing_prompt integration tests - Add Rule 8 vs Rule 9 priority interaction test - Add thought_node actions-is-None and observation_node truncation/pruning tests - Add _build_tool_guidance_for_ecosystem per-ecosystem filtering tests Java CCA coverage: - Add function_called_from_caller_body tests (24 cases) - Add extract_from_query, infer_class_name_and_package_name tests - Add is_java_fqcn, extract_maven_artifact, _is_doc_excluded tests - Add __find_caller_function and __find_initial_function direct tests JS parser/segmenter coverage: - Add search_for_called_function branch tests - Add is_valid, create_map_of_local_vars, is_exported_function tests - Add _get_tree caching, should_skip, nested class extraction tests C segmenter coverage: - Add find_top_level_blocks, remove_macro_blocks, extract_define_functions tests Go/Python/C parser coverage: - Add is_tree_key_match, get_function_name, is_package_imported edge case tests - Add Python utility method and class-without-parens tests - Add C get_package_names, filter_docs, document_imports_package tests Tools coverage: - Add FL stdlib_cache, flow_control, singleton isolation tests - Add config scanner cache eviction and concurrent access tests - Add IUA query verification and comment-line counting tests - Add git_commit_searcher _fetch_patch_via_http tests External integration coverage: - Add web patch fetcher parsing, Gitiles URL, commit extraction tests - Add async HTTP retry limit, raise_for_status, 500 boundary tests - Add SERP key exhaustion reset and error propagation tests - Add git_repo_manager clone, fetch, concurrency, host validation tests VEX/intel/version coverage: - Add unexpected justification_label, RPM+NVD range, version check error tests - Add package identifier utility method tests - Add _is_safe_url, identify() with intel=None tests LLM engine/checklist/prompting coverage: - Add preprocess_engine_input, postprocess_engine_output branch tests - Add build_no_vuln_packages_output justification tests - Add generate_checklist, build_tool_descriptions per-tool tests Remaining coverage: - Add _ensure_venv, determine_python_version, vulnerability_intel_sanitizer tests - Add source_classification, credential_client, transitive_detection tests - Rewrite cve_fetch_patches tests to call real _arun Test file consolidation: - Merge 18 deleted test files into consolidated per-domain test modules - Add 13 new focused test files for previously uncovered modules Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

- Fix serp_api_wrapper.py callers passing removed max_retries field (Pydantic ValidationError at runtime) - Fix web_patch_fetcher.py _fetch_gitiles_patch yarl double-encoding %5E%21 (use yarl.URL(encoded=True)) - Fix java_functions_parsers.py _count_call_args treating < comparison as generic bracket (dual-comma fallback) - Fix repo_resolver.py normalize_package_name dropping original case for mixed-case JSON keys (NetworkManager) - Fix javascript_functions_parser.py is_comment_line classifying generator *method() as block comment - Fix golang_functions_parsers.py is_package_imported unescaped identifier in regex (add re.escape) - Fix configuration_scanner.py cache read outside lock causing KeyError on concurrent eviction (use .get()) Convention fixes: - Fix test_java_cca.py _extract docstring referencing search_for_called_function - Fix test_go_parser.py docstrings referencing fix history instead of describing behavior Tests: - Add SerpAPI extra_forbidden validation test - Add Gitiles yarl.URL encoding preservation tests - Add _count_call_args unbalanced angle bracket tests (comparison, bit shift, ternary) - Add normalize_package_name mixed-case preservation tests - Add is_comment_line generator method vs block comment tests - Add is_package_imported regex escape and substring rejection tests - Add configuration scanner cache eviction safety tests Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

packages are searched before library packages - Fixes nondeterministic timeouts in JS transitive search (test_java_script_transitive_search_1 hung ~80% of runs) - Root cause: parent order from _get_parents was nondeterministic; when a library package (e.g. @cyclonedx/cyclonedx-library) was iterated first, DFS entered intra-package call chains (package lists itself as own parent) and explored hundreds of branches before finding the root caller - No search paths removed — only iteration order changed to prioritize root_project Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

quality, misc fixes - Add argument-count pre-filter to C parser's search_for_called_function to reject cross-package false positives where a same-named function has different arity (e.g. rsync read_byte(1 param) vs PostgreSQL read_byte macro with 3 args) - Add _count_c_declared_params and _count_call_site_args helpers for the pre-filter - Update search_for_called_function signature: callee_function is now a positional parameter (was keyword-only callee_function:Document = None) - Add 6 new test cases for argument-count filtering (match, mismatch, variadic, no-callee, void-param) and 12 tests for the counting helpers - Parallelize source JAR extraction using ThreadPoolExecutor with cgroup-aware worker count (_available_cpus helper reads /sys/fs/cgroup/cpu.max) - Add -Dmaven.artifact.threads=10 to all mvn dependency:copy-dependencies, clean install, and depgraph-maven-plugin invocations in dep_tree.py - Set MAVEN_OPTS with -Dmaven.repo.local on shared PVC in and exploit_iq_service.yaml for persistent Maven cache across runs - Add "AVOID UNANSWERABLE QUESTIONS" section to checklist prompt to prevent runtime-state questions that static analysis tools cannot answer - Shorten cve_verify_vuln_package LLM response instruction to one sentence - Update test_agent.py: _build_observation_context now takes critical_context list parameter; add test_crit_context_merged_into_knowledge; update pre_process_node assertions for critical_context field - Add CVE-2025-48734 comment on test_transitive_search_java_1 - Remove section-separator comments from test_c_parser.py Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

tmihalac · 2026-06-28T21:11:07Z

/test vulnerability-analysis-on-pr

tmihalac · 2026-06-28T21:28:30Z

/test vulnerability-analysis-on-pr

tmihalac · 2026-06-28T21:34:00Z

/test vulnerability-analysis-on-pr

tmihalac · 2026-06-28T21:45:12Z

/test vulnerability-analysis-on-pr

tmihalac · 2026-06-28T22:12:56Z

/test-heavy

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

GUARD - Emit TARGET_IN_VULNERABLE_RANGE (YES/NO) in VulnerabilityIntel.format_for_prompt() so the L1 agent can see the field referenced by the VERSION-BASED FALLBACK rules - Add VERSION GUARD clause to Case B sys prompt CONCLUSION section: when TARGET_IN_VULNERABLE_RANGE is YES, a grep match alone is not sufficient to conclude PATCHED — the fix must be at the exact CVE location, not a similar pre-existing check - Add VERSION GUARD phase to Case B thought instructions PHASE 3 VERDICT: when fix pattern was found but target is in vulnerable range, verify the match is the exact CVE fix or conclude VULNERABLE (version-based) - Add comment explaining the CVE-2024-48957/libarchive triggering case for the Case B prompt changes - Add 21 tests in test_vulnerability_intel_format.py covering format_for_prompt() field emission (TARGET_IN_VULNERABLE_RANGE, downstream patch, vulnerable/fix patterns, bitness, ordering), select_upstream_prompt_and_instructions routing, and Case B prompt content assertions Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

build/tool config patterns - Add _CONFIG_DIR_ALLOWED_EXTENSIONS allowlist for files matched only by directory - Fixes Keycloak issue where .js/.css/.map files under resources/ were collected - Add build/tool config patterns: pyproject.toml, setup.cfg, tox.ini, tsconfig.json, .eslintrc.json, Makefile, CMakeLists.txt, meson.build - Extensionless files in config dirs still accepted - Update test_collects_files_in_config_dir to use .xml instead of .txt - Add 10 new tests for allowlist, build tool patterns, and negative cases Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

improvements - Merge 9 in-tree test files into canonical tests/ location, delete in-tree copies: test_configuration_scanner, test_credential_client, test_import_usage_analyzer, test_javascript_functions_parser, test_source_code_git_loader, test_transitive_detection, test_version_check, test_vulnerability_intel_sanitizer, test_web_patch_fetcher - Config scanner: add _CONFIG_DIR_ALLOWED_EXTENSIONS allowlist for directory-matched files, filtering out .js/.css/.map/.java/.py etc. from config collection - Config scanner: add build/tool config patterns (pyproject.toml, setup.cfg, tox.ini, tsconfig.json, .eslintrc.json, Makefile, CMakeLists.txt, meson.build) - Add MAVEN_OPTS with persistent repo cache to on-pull-request.yaml - Add 30+ new tests across merged files covering allowlist enforcement, build tool patterns, dependency manifests negative cases, and unique tests from in-tree files Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

tmihalac · 2026-06-29T10:40:54Z

/test vulnerability-analysis-on-pr

tmihalac · 2026-06-29T11:15:27Z

/test-heavy

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

tmihalac · 2026-06-30T07:00:19Z

/test-heavy

uber-jar threshold - Add Go sub-package awareness to Function Locator (_extract_go_subpackage, _go_subpackage_flow_control) - Add CCA sub-package filtering via resolve_subpackage_to_module in Go parser and tree fallback in chain_of_calls_retriever - Add Go sub-package enrichment from patches in intel_utils (extract_go_subpackages_from_patch) - Pass candidate_packages to enrich_vulnerable_functions_from_patch in cve_agent - Add base resolve_subpackage_to_module to LangFunctionsParser (returns None for non-Go) - Add GOCACHE env var to on-cm-runner.yaml, on-pull-request.yaml, exploit_iq_service.yaml - Revert uber_jar_file_threshold from 1000 back to 600 in all config files - Add 38 tests for Go sub-package fixes (FL feedback, CCA filtering, intel enrichment) Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

tmihalac · 2026-06-30T12:14:47Z

/test-heavy

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

tmihalac · 2026-06-30T15:43:58Z

/test vulnerability-analysis-on-pr

- Change build_short_go_package_name to store list of packages per short name instead of overwriting - Search all matching packages in locate_functions via any() when short name resolves to multiple packages - Add test for unrelated packages sharing the same short name (github.com/foo/util vs github.com/bar/util) - Update existing short_name tests to expect list values and verify both versions preserved on collision Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

tmihalac · 2026-06-30T16:23:11Z

/test-heavy

- Remove C-L, C-H, C-M, A-H, B-M coverage-tracking labels from comments, docstrings, and section headers - Remove "Coverage gap tests:" and "Fix A/B/C:" prefixes from test section comments - Keep descriptive text, only strip the label identifiers - 22 files cleaned across tests/ and src/*/tests/ Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

tmihalac · 2026-07-01T07:44:03Z

/test-heavy

zvigrinberg

Hi @tmihalac ,

Please see the comments below.

zvigrinberg · 2026-07-01T10:00:04Z

                    return self.common_flow_control(function, package_docs)
                case Ecosystem.GO.value:
-                    return self.common_flow_control(function, package_docs)
+                    return self._go_subpackage_flow_control(function, package_docs, package)


Bug: module_path receives unresolved short name — sub-package disambiguation is dead code

The refactoring that introduced packages_to_search stopped reassigning package to the resolved full module path. This call:

return self._go_subpackage_flow_control(function, package_docs, package)

passes the original short name (e.g., 'protojson') as module_path. Inside _extract_go_subpackage, the check dir_path.startswith(norm_module + '/') never matches a short name against a full source path like google.golang.org/protobuf/encoding/protojson/.... Every function falls through to return module_path, collapsing all sub-packages into one bucket — so len(subpkg_to_funcs) <= 1 is always true and the multi-sub-package logic never triggers.

Fix: Pass the resolved module path(s) from packages_to_search:

return self._go_subpackage_flow_control(function, package_docs, packages_to_search[0])

Or iterate over all resolved paths if multiple are possible.

zvigrinberg · 2026-07-01T10:00:04Z

+
+def _count_call_site_args(function_body: str, func_name: str) -> int | None:
+    """Count arguments at the first call site of func_name(...) in function_body.
+    Returns None if the call site cannot be parsed."""


Bug: Comma counting ignores string/char literals — causes false rejections

_count_call_site_args iterates characters tracking only parenthesis depth:

for ch in args_str: if ch == '(': depth += 1 elif ch == ')': depth -= 1 elif ch == ',' and depth == 0: count += 1

This doesn't skip commas inside string or character literals. A call like func("error: a,b", x) is counted as 3 args instead of 2, causing a spurious mismatch against the declared parameter count and rejecting a valid call chain.

This is a common pattern in C code (format strings, error messages like fprintf(stderr, "expected %d, got %d", a, b)).

The Java counterpart _count_call_args has a related issue where < in comparison expressions (e.g., a < b) is treated as a generic angle-bracket opener.

zvigrinberg · 2026-07-01T10:00:04Z

+        )
+        result = list(close_matches)
+        result.append(guidance)
+        return result


Type mixing: Guidance string appended to function-name list

result = list(close_matches) result.append(guidance) return result

This appends a multi-line INFO string (e.g., "INFO: Matched functions exist in multiple sub-packages of 'module':\n subpkg1: func1\n...") to a list that otherwise contains only function names. The caller locate_functions returns this as "result": result, documented as [function_names].

While the LLM consumer may parse this gracefully, any programmatic downstream code that iterates over result treating every element as a valid function name will produce incorrect lookups or regex errors when it hits the guidance text (which contains spaces, colons, and newlines).

Suggestion: Return the guidance separately:

return {"functions": list(close_matches), "guidance": guidance}

Or log it instead of embedding it in the return value.

zvigrinberg · 2026-07-01T10:00:04Z

                            params["api_key"] = self._rotate_next_key()
                    else:
                        raise
+            self.__class__._serp_api_key_index = 0


Race condition: Resetting shared _serp_api_key_index outside the lock

self.__class__._serp_api_key_index = 0 raise Exception("All API keys exhausted")

_serp_api_key_index is a ClassVar shared across all instances. All other mutations (validate_serp_api_keys, _rotate_next_key) properly guard writes with _key_rotation_lock, but this reset happens outside the lock.

If a concurrent caller is mid-rotation (e.g., on key index 3 of 5), this unlocked reset to 0 causes it to re-try already-exhausted keys (getting 402/429 loops) or skip keys it hasn't tried.

Fix: Move inside the lock:

with self.__class__._key_rotation_lock: self.__class__._serp_api_key_index = 0 raise Exception("All API keys exhausted")

zvigrinberg · 2026-07-01T10:00:04Z

    def get_possible_docs(self, function_name_to_search: str, package: str, exclusions: list[Document],
                          sources_location_packages: bool,
                          target_class_names: frozenset[str],
                          method_exclusions: dict) -> (list[Document], bool):


Nit: Pre-index doesn't achieve algorithmic speedup — still O(unique_paths) linear scan

candidates = [doc for path, docs in self._source_path_index.items() if package in path for doc in docs]

This iterates all unique source paths doing a substring match — same algorithmic complexity as the old linear scan over all documents. The reduction from N (total docs) to U (unique paths) is a constant-factor improvement when files yield many functions, but it's not the O(1) lookup the pre-indexing pattern suggests.

Not a blocker — the constant-factor improvement plus the reordered search_token check before expensive is_function/_is_doc_excluded calls is a net positive. But if this becomes a bottleneck, an inverted index on path segments (mapping each package name to matching paths) would give true O(1) lookup.

RedTanny · 2026-07-01T14:25:09Z

A few questions before we merge the VERSION GUARD changes:

What is the basis for this change?
Can you provide a specific CVE case where the current behavior caused an incorrect verdict? Specifically:
Without a concrete example demonstrating the problem, it's hard to evaluate whether this change is necessary or correctly scoped.

Did you run the modified prompts against actual RPM checker scenarios to verify:

In my experience, LLMs frequently ignore or misinterpret prompt instructions — especially conditional logic like "if X then require Y". Adding a VERSION GUARD clause sounds reasonable in theory, but:

The model may still conclude PATCHED on a grep match alone
The model may become overly conservative and mark everything VULNERABLE
The interaction between this guard and other prompt rules is unpredictable

Suggestion: Before merging

find a real use case where this change is needed

verify that the prompt really works and the llm does not ignores it

run a regression test i have a dataset of 10 cases which i can send to check that changes does not create regression issues

zvigrinberg

Another cycle of review

zvigrinberg · 2026-07-02T10:57:25Z

+        if stripped.startswith('*'):
+            rest = stripped[1:].lstrip()
+            if rest and (rest[0].isalnum() or rest[0] in ('$', '_', '[')):
+                return False
+            return True


High: is_comment_line misclassifies most JSDoc continuation lines as non-comment

A line like * Returns the cached value has rest='Returns...' and rest[0]='R' is alphanumeric, so the method returns False. This means the majority of JSDoc prose lines survive the comment filter. At line 207, unfiltered JSDoc prose is scanned for call patterns, producing false-positive caller-callee edges when prose mentions function names (e.g., * Delegates to parseJSON).

The intent was to distinguish *method() generator syntax from * JSDoc text, but the heuristic catches far too much.

Suggestion: Check whether the line is inside a /* ... */ block comment rather than inspecting the character after *. Or use a narrower heuristic — generator methods start with * immediately followed by an identifier without a space:

if stripped.startswith('*'): rest = stripped[1:] # Generator syntax: *methodName() — no space after * if rest and not rest[0].isspace(): return False # JSDoc continuation: * some text — space after * return True

zvigrinberg · 2026-07-02T10:57:25Z

                if matching and matching.group(0):
                    import_line = code_content[matching.start():]
                    import_package_line = import_line[:import_line.find(os.linesep)].strip()


High: is_package_imported regex adds trailing .*, matching identifier anywhere in import path

Old regex: import ['"].*{identifier}['"] — identifier at end of path.
New regex: import ['"].*{esc_id}.*['"] — identifier anywhere in path.

So identifier='json' now matches import "github.com/json-iterator/go" because json appears mid-path. This creates false-positive import matches and allows unrelated packages to pass the call-chain filter.

The re.escape(identifier) fix is correct and needed — but the trailing .* changes the matching semantics.

Suggestion: Keep the anchor at the end, just add the escape:

esc_id = re.escape(identifier) matching = re.search(rf"import [\'\"].*{esc_id}[\'\"]" , code_content)

Or if you need to match identifier as a path segment (not just at the end), use a more precise pattern:

matching = re.search(rf"import [\'\"].*[/]{esc_id}[\'\"]" , code_content)

@zvigrinberg This comment issue addresses an existing code, not new added one, thus need to embrace new needed escaping of identifier, and keep the greedy quantifier * at the end as is.

zvigrinberg · 2026-07-02T10:57:25Z

+            elif ch == '>' and depth_a > 0:
+                depth_a -= 1
+            elif ch == ',' and depth_p == 0 and depth_b == 0:
+                commas_without_angles += 1
+                if depth_a == 0:
+                    commas_with_angles += 1
+        if depth_a == 0:
+            return commas_with_angles + 1
+        return commas_without_angles + 1


High: _count_call_args miscounts when </> are comparison operators balancing across a comma

For assertTrue(x < 10, y > 5):

< increments depth_a to 1

At the comma, depth_a != 0, so only commas_without_angles is incremented (not commas_with_angles)

> decrements depth_a back to 0

Since depth_a == 0, returns commas_with_angles + 1 = 1 instead of the correct 2

This causes the arg-count pre-filter to reject valid call sites wherever assertions, comparisons, or ternaries use </> across argument boundaries.

Suggestion: Use a two-pass approach — first try treating </> as angle brackets. If depth_a goes negative at any point (a > without a preceding <), restart treating all </> as operators and count only parenthesis/bracket depth:

# If depth_a ever goes negative, it means > was a comparison, not a bracket close. # Fall back to ignoring angle brackets entirely. if depth_a < 0: return _count_ignoring_angles(s, open_idx, close_idx)

zvigrinberg · 2026-07-02T10:57:25Z

+    pattern = re.compile(r'\b' + re.escape(func_name) + r'\s*\(')
+    m = pattern.search(function_body)
+    if not m:
+        return None


Medium: C arg-count pre-filter checks only the first call site — rejects valid matches when a same-named local function appears first

_count_call_site_args uses pattern.search(function_body) (first match only). The docstring confirms: "Count arguments at the first call site."

If a caller has init() (0 args, local helper) textually before init(ctx, cfg) (2 args, the real callee), re.search finds the wrong one. The arity mismatch causes return False, rejecting a valid caller-callee edge.

Suggestion: Use re.finditer and check ALL call sites — accept if ANY matches the declared param count:

def _count_call_site_args(function_body: str, func_name: str) -> list[int]: """Count arguments at all call sites of func_name(...) in function_body.""" pattern = re.compile(r'\b' + re.escape(func_name) + r'\s*\(') counts = [] for m in pattern.finditer(function_body): count = _count_args_from_match(function_body, m) if count is not None: counts.append(count) return counts

Then in the caller: if declared not in call_arg_counts: return False

zvigrinberg · 2026-07-02T10:57:25Z

+        norm_module = module_path.rstrip("/")
+        if dir_path == norm_module or dir_path.startswith(norm_module + "/"):
+            return dir_path


Medium: _go_subpackage_flow_control doesn't catch ValueError from get_function_name, unlike python_flow_control fixed in the same PR

This PR adds try/except ValueError: continue to python_flow_control (line ~188 of the diff) but the new _go_subpackage_flow_control calls get_function_name unguarded:

function_name = self.lang_parser.get_function_name(doc)

Go's get_function_name raises ValueError on malformed function headers (line 416 in golang_functions_parsers.py: raise ValueError(f"Invalid function header")). A single malformed document crashes the entire locate_functions call.

Suggestion: Add the same guard:

for doc in package_docs: if self.lang_parser: try: function_name = self.lang_parser.get_function_name(doc) except ValueError: continue if function_name: func_to_docs[function_name].append(doc)

zvigrinberg · 2026-07-02T15:20:40Z

+        if not path.endswith(".go"):
+            continue


Suggested change

if not path.endswith(".go"):

continue

go_func_parser=GoLanguageFunctionsParser()

extensions=go_func_parser.supported_files_extensions()

if not any (path.endswith(ext) for ext in extensions):

continue

tmihalac force-pushed the CCA-Argument-Count-Pre-filter branch from 0636a88 to 759ac49 Compare June 28, 2026 05:47

tmihalac added 14 commits June 28, 2026 13:19

Remove parser_threshold parameter from ExtendedLanguageParser creation

56bde63

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

Removed a test

7af1213

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

Added debug logging

15ee1eb

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

Added debug logging

3702027

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

Added debug logging

a220084

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

Performance fixes for JS

b7ec881

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

Fixed tests

8418adf

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

tmihalac force-pushed the CCA-Argument-Count-Pre-filter branch from 023d7a3 to bde7996 Compare June 28, 2026 20:37

Increase cores to 3

e0cc7c5

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

tmihalac added 4 commits June 29, 2026 10:13

Removed AVOID UNANSWERABLE QUESTIONS from the prompt

e1c04bd

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

Increased uber-jar threshold to 1000

09c0f22

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

tmihalac added 2 commits June 30, 2026 18:13

Removed debug logging

6bebe70

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

Removed logging and set maven local repo in lint-test

d267ff2

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>

tmihalac requested a review from zvigrinberg July 1, 2026 08:37

tmihalac changed the title ~~Add CCA import-based pre-filter, cycle detection, and log lazy formatting~~ CCA pre-filters (Java import, C arg-count, Go sub-package), JS tree-sitter rewrite, RPM version guard, ~40 bug fixes, ~200 new tests Jul 1, 2026

zvigrinberg requested changes Jul 1, 2026

View reviewed changes

RedTanny requested changes Jul 1, 2026

View reviewed changes

zvigrinberg requested changes Jul 2, 2026

View reviewed changes

zvigrinberg reviewed Jul 2, 2026

View reviewed changes

Uh oh!

Conversation

tmihalac commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Java CCA

C CCA

Go CCA

CCA / all ecosystems

JS parser fixes

RPM checker

Config scanner

Source code bug fixes (~40)

Build & infra

Tests

Uh oh!

vbelouso commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

tmihalac commented Jun 23, 2026

Uh oh!

tmihalac commented Jun 24, 2026

Uh oh!

tmihalac commented Jun 24, 2026

Uh oh!

tmihalac commented Jun 25, 2026

Uh oh!

tmihalac commented Jun 26, 2026

Uh oh!

tmihalac commented Jun 26, 2026

Uh oh!

tmihalac commented Jun 28, 2026

Uh oh!

tmihalac commented Jun 28, 2026

Uh oh!

tmihalac commented Jun 28, 2026

Uh oh!

tmihalac commented Jun 28, 2026

Uh oh!

tmihalac commented Jun 28, 2026

Uh oh!

tmihalac commented Jun 28, 2026

Uh oh!

tmihalac commented Jun 28, 2026

Uh oh!

tmihalac commented Jun 28, 2026

Uh oh!

tmihalac commented Jun 28, 2026

Uh oh!

tmihalac commented Jun 29, 2026

Uh oh!

tmihalac commented Jun 29, 2026

Uh oh!

tmihalac commented Jun 30, 2026

Uh oh!

tmihalac commented Jun 30, 2026

Uh oh!

tmihalac commented Jun 30, 2026

Uh oh!

tmihalac commented Jun 30, 2026

Uh oh!

tmihalac commented Jul 1, 2026

Uh oh!

zvigrinberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zvigrinberg left a comment

Choose a reason for hiding this comment

tmihalac commented Jun 23, 2026 •

edited

Loading

vbelouso commented Jun 23, 2026 •

edited

Loading