CCA pre-filters (Java import, C arg-count, Go sub-package), Java CCA cycle detection & perf, RPM version guard, ~40 bug fixes, ~200 new tests#261
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
|
/test-heavy |
3 similar comments
|
/test-heavy |
|
/test-heavy |
|
/test-heavy |
|
/test vulnerability-analysis-on-pr |
|
/test-heavy |
0636a88 to
759ac49
Compare
|
/test vulnerability-analysis-on-pr |
2 similar comments
|
/test vulnerability-analysis-on-pr |
|
/test vulnerability-analysis-on-pr |
|
/test-heavy |
- Replaced esprima-based JavaScript segmenter with tree-sitter for reliable parsing of modern JS syntax (optional chaining, nullish coalescing, top-level await) - Fixed JS function name extraction: keyword filtering, position-aware matching, redundant pattern removal, generator/TypeScript/anonymous-export support - Added build-artifact filtering (should_skip) that excludes app-level dist/, build/static/, .min.js while preserving node_modules/*/dist/ as legitimate third-party source - Added empty-name guards in CCA BFS to prevent documents with unextractable function names from entering call-chain analysis - Fixed _get_function_calls regex to detect calls through optional chaining (obj?.method()) Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
formatting - Add _can_reference_class() to JavaChainOfCallsRetriever for 4-way import visibility check (simple class name, wildcard import, same package, same artifact) - Apply import pre-filter in _get_possible_docs via optional declaring_fqcn/callee_file_name/code_documents params - Pass declaring FQCN from __find_caller_function to _get_possible_docs to eliminate irrelevant uber-JAR candidates before expensive type resolution - Only filter third-party candidates; application code (root docs) always passes to avoid false negatives from polymorphic interface calls - Add DFS cycle detection guard in get_relevant_documents to prevent infinite loops from self-recursive or mutually recursive method calls - Switch logger.debug in __check_identifier_resolved_to_callee_function_package from f-strings to %s lazy formatting to avoid string construction when debug logging is disabled Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
escaping, and CCA empty-result guidance - Deduplicate parents list in non-Java CCA tree_dict to prevent duplicate entries from dependency tree builder - Add Go subpackage prefix matching to Rule 8 so "github.com/lib/foo/bar" matches target "github.com/lib/foo" - Add distinct "function not found" message when CCA returns empty call_hierarchy_list so agent distinguishes missing function from unreachable function - Escape regex metacharacters in Function Caller Finder query builder to handle identifiers containing dots, brackets, and plus sign Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
- Fix dep_tree.py missing comma causing "--pythonvenv_python" string
concatenation
- Fix dep_tree.py C/C++ detect_ecosystem walk not using
_WALK_EXCLUDE_DIRS
- Fix c_segmenter_custom.py remove_comments stripping patterns inside
string literals
- Fix c_lang_function_parsers.py debug print statement left in
production code
- Fix golang_functions_parsers.py `len(declaration_parts) == (2 or
3)` always-true comparison
- Fix golang_functions_parsers.py no-op `re.search("")` call
- Fix golang_functions_parsers.py is_package_imported raw string
split and missing quote stripping
- Fix golang_functions_parsers.py is_same_package crash on empty
input
- Fix javascript_functions_parser.py is_comment_line missing block
comment continuation (`*`)
- Fix javascript_functions_parser.py _extract_class_name regex
missing `$` in identifier class
- Fix javascript_functions_parser.py _parse_declarations unused
is_multiline parameter
- Fix javascript_functions_parser.py backreference `[^\1]` in string
pattern
- Fix python_functions_parser.py is_same_package returning True for
two empty strings
- Fix python_segmenters_with_classes_methods.py annotating all
methods with last class name
- Fix python_segmenters_with_classes_methods.py skipping async def
methods
- Fix source_code_git_loader.py safe.directory guard unnecessarily
gated on clone_url
- Fix brew_downloader.py returning path even with zero downloads
- Fix brew_downloader.py extracting SRPM from cache path instead of
target path
- Fix configuration_scanner.py re.match allowing partial filename
matches
- Fix configuration_scanner.py max_results=0 returning 1 result
- Fix configuration_scanner.py cache race condition on repo_key check
outside lock
- Fix configuration_scanner.py missing docker-compose*.yaml pattern
- Fix import_usage_analyzer.py empty short_name matching everything
- Fix async_http_utils.py off-by-one in retry count (`<=` vs `<`)
- Fix async_http_utils.py consumer errors caught by retry loop
instead of propagating
- Fix async_http_utils.py retry_on_client_errors overridden by
Retry-After check
- Fix async_http_utils.py negative sleep from X-RateLimit-Reset in
the past
- Fix async_http_utils.py missing @functools.wraps on retry_async
wrapper
- Fix function_name_locator.py python_flow_control crash on
non-function documents
- Fix function_name_locator.py Go versioned module short-name
collision (v2, v3)
- Fix git_commit_searcher.py _rank_results mutating confidence
in-place
- Fix git_repo_manager.py double-wrapping GitCommandError
- Fix intel_utils.py parse_cpe checking split_cpe[5] instead of
split_cpe[10] for system
- Fix llm_engine_utils.py assert False in production code replaced
with RuntimeError
- Fix repo_resolver.py case-sensitivity bug in normalize_package_name
- Fix serp_api_wrapper.py key index not reset after full rotation
- Fix serp_api_wrapper.py dead max_retries field
- Fix csaf_generator.py GHSA description dropped when no pre-existing
note
- Fix csaf_generator.py notes appended with text: None when
summary/justification missing
- Fix web_patch_fetcher.py missing asyncio import for TimeoutError
catch
- Fix web_patch_fetcher.py _is_commit_url false positive on /c/
outside kernel.org
- Fix web_patch_fetcher.py dropping Gitiles commit URLs from
candidates
- Fix prompting.py build_tool_descriptions missing FL, CONFIG, IUA,
GREP entries
Test correctness fixes:
- Replace tautological assertions (disjunctive or, truthiness-only,
conditional if-then-assert)
- Rewrite tests that reimplemented source logic instead of calling
real functions
- Fix mock searcher ignoring tantivy query parameter in IUA tests
- Fix test_stub_only_triggers_pypi_fetch swallowing all exceptions
via try/except pass
- Fix test_clone_failure_cleans_temp_dir vacuously-passing assertion
- Fix test_consumer_error_propagates using overly broad
pytest.raises(Exception)
- Fix test_optional_chaining_preservation asserting on input string
not parsed output
- Fix test_remove_comments_string_literal wrong docstring and
tautological assertion
- Fix test_key_rotation not verifying actual key sent in HTTP request
- Fix test_all_tools_produce_7_descriptions omitting FL, CONFIG, IUA,
GREP
- Fix conditional assertion in git_commit_searcher silently passing
on None
- Fix test_third_party_docs weak assertion not verifying actual jar
key
- Fix test_llm_engine_utils disjunctive or assertion masking wrong
return value
Agent/pipeline coverage:
- Add pre_process_node tests for ReachabilityAgent and
CodeUnderstandingAgent
- Add _postprocess_results exception handling tests
- Add dispatch_question exception fallback and build_routing_prompt
integration tests
- Add Rule 8 vs Rule 9 priority interaction test
- Add thought_node actions-is-None and observation_node
truncation/pruning tests
- Add _build_tool_guidance_for_ecosystem per-ecosystem filtering
tests
Java CCA coverage:
- Add function_called_from_caller_body tests (24 cases)
- Add extract_from_query, infer_class_name_and_package_name tests
- Add is_java_fqcn, extract_maven_artifact, _is_doc_excluded tests
- Add __find_caller_function and __find_initial_function direct tests
JS parser/segmenter coverage:
- Add search_for_called_function branch tests
- Add is_valid, create_map_of_local_vars, is_exported_function tests
- Add _get_tree caching, should_skip, nested class extraction tests
C segmenter coverage:
- Add find_top_level_blocks, remove_macro_blocks,
extract_define_functions tests
Go/Python/C parser coverage:
- Add is_tree_key_match, get_function_name, is_package_imported edge
case tests
- Add Python utility method and class-without-parens tests
- Add C get_package_names, filter_docs, document_imports_package
tests
Tools coverage:
- Add FL stdlib_cache, flow_control, singleton isolation tests
- Add config scanner cache eviction and concurrent access tests
- Add IUA query verification and comment-line counting tests
- Add git_commit_searcher _fetch_patch_via_http tests
External integration coverage:
- Add web patch fetcher parsing, Gitiles URL, commit extraction tests
- Add async HTTP retry limit, raise_for_status, 500 boundary tests
- Add SERP key exhaustion reset and error propagation tests
- Add git_repo_manager clone, fetch, concurrency, host validation
tests
VEX/intel/version coverage:
- Add unexpected justification_label, RPM+NVD range, version check
error tests
- Add package identifier utility method tests
- Add _is_safe_url, identify() with intel=None tests
LLM engine/checklist/prompting coverage:
- Add preprocess_engine_input, postprocess_engine_output branch tests
- Add build_no_vuln_packages_output justification tests
- Add generate_checklist, build_tool_descriptions per-tool tests
Remaining coverage:
- Add _ensure_venv, determine_python_version,
vulnerability_intel_sanitizer tests
- Add source_classification, credential_client, transitive_detection
tests
- Rewrite cve_fetch_patches tests to call real _arun
Test file consolidation:
- Merge 18 deleted test files into consolidated per-domain test
modules
- Add 13 new focused test files for previously uncovered modules
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
- Fix serp_api_wrapper.py callers passing removed max_retries field (Pydantic ValidationError at runtime) - Fix web_patch_fetcher.py _fetch_gitiles_patch yarl double-encoding %5E%21 (use yarl.URL(encoded=True)) - Fix java_functions_parsers.py _count_call_args treating < comparison as generic bracket (dual-comma fallback) - Fix repo_resolver.py normalize_package_name dropping original case for mixed-case JSON keys (NetworkManager) - Fix javascript_functions_parser.py is_comment_line classifying generator *method() as block comment - Fix golang_functions_parsers.py is_package_imported unescaped identifier in regex (add re.escape) - Fix configuration_scanner.py cache read outside lock causing KeyError on concurrent eviction (use .get()) Convention fixes: - Fix test_java_cca.py _extract docstring referencing search_for_called_function - Fix test_go_parser.py docstrings referencing fix history instead of describing behavior Tests: - Add SerpAPI extra_forbidden validation test - Add Gitiles yarl.URL encoding preservation tests - Add _count_call_args unbalanced angle bracket tests (comparison, bit shift, ternary) - Add normalize_package_name mixed-case preservation tests - Add is_comment_line generator method vs block comment tests - Add is_package_imported regex escape and substring rejection tests - Add configuration scanner cache eviction safety tests Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
packages are searched before library packages - Fixes nondeterministic timeouts in JS transitive search (test_java_script_transitive_search_1 hung ~80% of runs) - Root cause: parent order from _get_parents was nondeterministic; when a library package (e.g. @cyclonedx/cyclonedx-library) was iterated first, DFS entered intra-package call chains (package lists itself as own parent) and explored hundreds of branches before finding the root caller - No search paths removed — only iteration order changed to prioritize root_project Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
quality, misc fixes
- Add argument-count pre-filter to C parser's
search_for_called_function to reject
cross-package false positives where a same-named function has
different arity
(e.g. rsync read_byte(1 param) vs PostgreSQL read_byte macro with 3
args)
- Add _count_c_declared_params and _count_call_site_args helpers for
the pre-filter
- Update search_for_called_function signature: callee_function is now
a positional
parameter (was keyword-only callee_function:Document = None)
- Add 6 new test cases for argument-count filtering (match, mismatch,
variadic,
no-callee, void-param) and 12 tests for the counting helpers
- Parallelize source JAR extraction using ThreadPoolExecutor with
cgroup-aware
worker count (_available_cpus helper reads /sys/fs/cgroup/cpu.max)
- Add -Dmaven.artifact.threads=10 to all mvn
dependency:copy-dependencies,
clean install, and depgraph-maven-plugin invocations in dep_tree.py
- Set MAVEN_OPTS with -Dmaven.repo.local on shared PVC in
and exploit_iq_service.yaml for persistent Maven cache across runs
- Add "AVOID UNANSWERABLE QUESTIONS" section to checklist prompt to
prevent
runtime-state questions that static analysis tools cannot answer
- Shorten cve_verify_vuln_package LLM response instruction to one
sentence
- Update test_agent.py: _build_observation_context now takes
critical_context list
parameter; add test_crit_context_merged_into_knowledge; update
pre_process_node
assertions for critical_context field
- Add CVE-2025-48734 comment on test_transitive_search_java_1
- Remove section-separator comments from test_c_parser.py
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
023d7a3 to
bde7996
Compare
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
|
/test vulnerability-analysis-on-pr |
1 similar comment
|
/test vulnerability-analysis-on-pr |
|
/test vulnerability-analysis-on-pr |
1 similar comment
|
/test vulnerability-analysis-on-pr |
|
/test-heavy |
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
GUARD
- Emit TARGET_IN_VULNERABLE_RANGE (YES/NO) in
VulnerabilityIntel.format_for_prompt()
so the L1 agent can see the field referenced by the VERSION-BASED
FALLBACK rules
- Add VERSION GUARD clause to Case B sys prompt CONCLUSION section:
when
TARGET_IN_VULNERABLE_RANGE is YES, a grep match alone is not
sufficient to
conclude PATCHED — the fix must be at the exact CVE location, not a
similar
pre-existing check
- Add VERSION GUARD phase to Case B thought instructions PHASE 3
VERDICT:
when fix pattern was found but target is in vulnerable range,
verify the match
is the exact CVE fix or conclude VULNERABLE (version-based)
- Add comment explaining the CVE-2024-48957/libarchive triggering
case for the
Case B prompt changes
- Add 21 tests in test_vulnerability_intel_format.py covering
format_for_prompt()
field emission (TARGET_IN_VULNERABLE_RANGE, downstream patch,
vulnerable/fix
patterns, bitness, ordering),
select_upstream_prompt_and_instructions routing,
and Case B prompt content assertions
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
build/tool config patterns
- Add _CONFIG_DIR_ALLOWED_EXTENSIONS allowlist for files matched only
by directory
- Fixes Keycloak issue where .js/.css/.map files under resources/
were collected
- Add build/tool config patterns: pyproject.toml, setup.cfg, tox.ini,
tsconfig.json,
.eslintrc.json, Makefile, CMakeLists.txt, meson.build
- Extensionless files in config dirs still accepted
- Update test_collects_files_in_config_dir to use .xml instead of
.txt
- Add 10 new tests for allowlist, build tool patterns, and negative
cases
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
improvements
- Merge 9 in-tree test files into canonical tests/ location, delete
in-tree copies:
test_configuration_scanner, test_credential_client,
test_import_usage_analyzer,
test_javascript_functions_parser, test_source_code_git_loader,
test_transitive_detection, test_version_check,
test_vulnerability_intel_sanitizer,
test_web_patch_fetcher
- Config scanner: add _CONFIG_DIR_ALLOWED_EXTENSIONS allowlist for
directory-matched
files, filtering out .js/.css/.map/.java/.py etc. from config
collection
- Config scanner: add build/tool config patterns (pyproject.toml,
setup.cfg, tox.ini,
tsconfig.json, .eslintrc.json, Makefile, CMakeLists.txt,
meson.build)
- Add MAVEN_OPTS with persistent repo cache to on-pull-request.yaml
- Add 30+ new tests across merged files covering allowlist
enforcement, build tool
patterns, dependency manifests negative cases, and unique tests
from in-tree files
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
|
/test vulnerability-analysis-on-pr |
|
/test-heavy |
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
|
/test-heavy |
uber-jar threshold - Add Go sub-package awareness to Function Locator (_extract_go_subpackage, _go_subpackage_flow_control) - Add CCA sub-package filtering via resolve_subpackage_to_module in Go parser and tree fallback in chain_of_calls_retriever - Add Go sub-package enrichment from patches in intel_utils (extract_go_subpackages_from_patch) - Pass candidate_packages to enrich_vulnerable_functions_from_patch in cve_agent - Add base resolve_subpackage_to_module to LangFunctionsParser (returns None for non-Go) - Add GOCACHE env var to on-cm-runner.yaml, on-pull-request.yaml, exploit_iq_service.yaml - Revert uber_jar_file_threshold from 1000 back to 600 in all config files - Add 38 tests for Go sub-package fixes (FL feedback, CCA filtering, intel enrichment) Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
|
/test-heavy |
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
|
/test vulnerability-analysis-on-pr |
- Change build_short_go_package_name to store list of packages per short name instead of overwriting - Search all matching packages in locate_functions via any() when short name resolves to multiple packages - Add test for unrelated packages sharing the same short name (github.com/foo/util vs github.com/bar/util) - Update existing short_name tests to expect list values and verify both versions preserved on collision Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
|
/test-heavy |
- Remove C-L, C-H, C-M, A-H, B-M coverage-tracking labels from comments, docstrings, and section headers - Remove "Coverage gap tests:" and "Fix A/B/C:" prefixes from test section comments - Keep descriptive text, only strip the label identifiers - 22 files cleaned across tests/ and src/*/tests/ Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
|
/test-heavy |
zvigrinberg
left a comment
There was a problem hiding this comment.
Hi @tmihalac ,
Please see the comments below.
| return self.common_flow_control(function, package_docs) | ||
| case Ecosystem.GO.value: | ||
| return self.common_flow_control(function, package_docs) | ||
| return self._go_subpackage_flow_control(function, package_docs, package) |
There was a problem hiding this comment.
Bug: module_path receives unresolved short name — sub-package disambiguation is dead code
The refactoring that introduced packages_to_search stopped reassigning package to the resolved full module path. This call:
return self._go_subpackage_flow_control(function, package_docs, package)passes the original short name (e.g., 'protojson') as module_path. Inside _extract_go_subpackage, the check dir_path.startswith(norm_module + '/') never matches a short name against a full source path like google.golang.org/protobuf/encoding/protojson/.... Every function falls through to return module_path, collapsing all sub-packages into one bucket — so len(subpkg_to_funcs) <= 1 is always true and the multi-sub-package logic never triggers.
Fix: Pass the resolved module path(s) from packages_to_search:
return self._go_subpackage_flow_control(function, package_docs, packages_to_search[0])Or iterate over all resolved paths if multiple are possible.
|
|
||
| def _count_call_site_args(function_body: str, func_name: str) -> int | None: | ||
| """Count arguments at the first call site of func_name(...) in function_body. | ||
| Returns None if the call site cannot be parsed.""" |
There was a problem hiding this comment.
Bug: Comma counting ignores string/char literals — causes false rejections
_count_call_site_args iterates characters tracking only parenthesis depth:
for ch in args_str:
if ch == '(':
depth += 1
elif ch == ')':
depth -= 1
elif ch == ',' and depth == 0:
count += 1This doesn't skip commas inside string or character literals. A call like func("error: a,b", x) is counted as 3 args instead of 2, causing a spurious mismatch against the declared parameter count and rejecting a valid call chain.
This is a common pattern in C code (format strings, error messages like fprintf(stderr, "expected %d, got %d", a, b)).
The Java counterpart _count_call_args has a related issue where < in comparison expressions (e.g., a < b) is treated as a generic angle-bracket opener.
| ) | ||
| result = list(close_matches) | ||
| result.append(guidance) | ||
| return result |
There was a problem hiding this comment.
Type mixing: Guidance string appended to function-name list
result = list(close_matches)
result.append(guidance)
return resultThis appends a multi-line INFO string (e.g., "INFO: Matched functions exist in multiple sub-packages of 'module':\n subpkg1: func1\n...") to a list that otherwise contains only function names. The caller locate_functions returns this as "result": result, documented as [function_names].
While the LLM consumer may parse this gracefully, any programmatic downstream code that iterates over result treating every element as a valid function name will produce incorrect lookups or regex errors when it hits the guidance text (which contains spaces, colons, and newlines).
Suggestion: Return the guidance separately:
return {"functions": list(close_matches), "guidance": guidance}Or log it instead of embedding it in the return value.
| params["api_key"] = self._rotate_next_key() | ||
| else: | ||
| raise | ||
| self.__class__._serp_api_key_index = 0 |
There was a problem hiding this comment.
Race condition: Resetting shared _serp_api_key_index outside the lock
self.__class__._serp_api_key_index = 0
raise Exception("All API keys exhausted")_serp_api_key_index is a ClassVar shared across all instances. All other mutations (validate_serp_api_keys, _rotate_next_key) properly guard writes with _key_rotation_lock, but this reset happens outside the lock.
If a concurrent caller is mid-rotation (e.g., on key index 3 of 5), this unlocked reset to 0 causes it to re-try already-exhausted keys (getting 402/429 loops) or skip keys it hasn't tried.
Fix: Move inside the lock:
with self.__class__._key_rotation_lock:
self.__class__._serp_api_key_index = 0
raise Exception("All API keys exhausted")| def get_possible_docs(self, function_name_to_search: str, package: str, exclusions: list[Document], | ||
| sources_location_packages: bool, | ||
| target_class_names: frozenset[str], | ||
| method_exclusions: dict) -> (list[Document], bool): |
There was a problem hiding this comment.
Nit: Pre-index doesn't achieve algorithmic speedup — still O(unique_paths) linear scan
candidates = [doc for path, docs in self._source_path_index.items()
if package in path for doc in docs]This iterates all unique source paths doing a substring match — same algorithmic complexity as the old linear scan over all documents. The reduction from N (total docs) to U (unique paths) is a constant-factor improvement when files yield many functions, but it's not the O(1) lookup the pre-indexing pattern suggests.
Not a blocker — the constant-factor improvement plus the reordered search_token check before expensive is_function/_is_doc_excluded calls is a net positive. But if this becomes a bottleneck, an inverted index on path segments (mapping each package name to matching paths) would give true O(1) lookup.
There was a problem hiding this comment.
A few questions before we merge the VERSION GUARD changes:
- What is the basis for this change?
Can you provide a specific CVE case where the current behavior caused an incorrect verdict? Specifically:
Without a concrete example demonstrating the problem, it's hard to evaluate whether this change is necessary or correctly scoped.
Did you run the modified prompts against actual RPM checker scenarios to verify:
In my experience, LLMs frequently ignore or misinterpret prompt instructions — especially conditional logic like "if X then require Y". Adding a VERSION GUARD clause sounds reasonable in theory, but:
The model may still conclude PATCHED on a grep match alone
The model may become overly conservative and mark everything VULNERABLE
The interaction between this guard and other prompt rules is unpredictable
Suggestion: Before merging
- find a real use case where this change is needed
- verify that the prompt really works and the llm does not ignores it
- run a regression test i have a dataset of 10 cases which i can send to check that changes does not create regression issues
zvigrinberg
left a comment
There was a problem hiding this comment.
Another cycle of review
| if stripped.startswith('*'): | ||
| rest = stripped[1:].lstrip() | ||
| if rest and (rest[0].isalnum() or rest[0] in ('$', '_', '[')): | ||
| return False | ||
| return True |
There was a problem hiding this comment.
High: is_comment_line misclassifies most JSDoc continuation lines as non-comment
A line like * Returns the cached value has rest='Returns...' and rest[0]='R' is alphanumeric, so the method returns False. This means the majority of JSDoc prose lines survive the comment filter. At line 207, unfiltered JSDoc prose is scanned for call patterns, producing false-positive caller-callee edges when prose mentions function names (e.g., * Delegates to parseJSON).
The intent was to distinguish *method() generator syntax from * JSDoc text, but the heuristic catches far too much.
Suggestion: Check whether the line is inside a /* ... */ block comment rather than inspecting the character after *. Or use a narrower heuristic — generator methods start with * immediately followed by an identifier without a space:
if stripped.startswith('*'):
rest = stripped[1:]
# Generator syntax: *methodName() — no space after *
if rest and not rest[0].isspace():
return False
# JSDoc continuation: * some text — space after *
return True| if matching and matching.group(0): | ||
| import_line = code_content[matching.start():] | ||
| import_package_line = import_line[:import_line.find(os.linesep)].strip() |
There was a problem hiding this comment.
High: is_package_imported regex adds trailing .*, matching identifier anywhere in import path
Old regex: import ['"].*{identifier}['"] — identifier at end of path.
New regex: import ['"].*{esc_id}.*['"] — identifier anywhere in path.
So identifier='json' now matches import "github.com/json-iterator/go" because json appears mid-path. This creates false-positive import matches and allows unrelated packages to pass the call-chain filter.
The re.escape(identifier) fix is correct and needed — but the trailing .* changes the matching semantics.
Suggestion: Keep the anchor at the end, just add the escape:
esc_id = re.escape(identifier)
matching = re.search(rf"import [\'\"].*{esc_id}[\'\"]" , code_content)Or if you need to match identifier as a path segment (not just at the end), use a more precise pattern:
matching = re.search(rf"import [\'\"].*[/]{esc_id}[\'\"]" , code_content)There was a problem hiding this comment.
@zvigrinberg This comment issue addresses an existing code, not new added one, thus need to embrace new needed escaping of identifier, and keep the greedy quantifier * at the end as is.
| elif ch == '>' and depth_a > 0: | ||
| depth_a -= 1 | ||
| elif ch == ',' and depth_p == 0 and depth_b == 0: | ||
| commas_without_angles += 1 | ||
| if depth_a == 0: | ||
| commas_with_angles += 1 | ||
| if depth_a == 0: | ||
| return commas_with_angles + 1 | ||
| return commas_without_angles + 1 |
There was a problem hiding this comment.
High: _count_call_args miscounts when </> are comparison operators balancing across a comma
For assertTrue(x < 10, y > 5):
<incrementsdepth_ato 1- At the comma,
depth_a != 0, so onlycommas_without_anglesis incremented (notcommas_with_angles) >decrementsdepth_aback to 0- Since
depth_a == 0, returnscommas_with_angles + 1 = 1instead of the correct 2
This causes the arg-count pre-filter to reject valid call sites wherever assertions, comparisons, or ternaries use </> across argument boundaries.
Suggestion: Use a two-pass approach — first try treating </> as angle brackets. If depth_a goes negative at any point (a > without a preceding <), restart treating all </> as operators and count only parenthesis/bracket depth:
# If depth_a ever goes negative, it means > was a comparison, not a bracket close.
# Fall back to ignoring angle brackets entirely.
if depth_a < 0:
return _count_ignoring_angles(s, open_idx, close_idx)| pattern = re.compile(r'\b' + re.escape(func_name) + r'\s*\(') | ||
| m = pattern.search(function_body) | ||
| if not m: | ||
| return None |
There was a problem hiding this comment.
Medium: C arg-count pre-filter checks only the first call site — rejects valid matches when a same-named local function appears first
_count_call_site_args uses pattern.search(function_body) (first match only). The docstring confirms: "Count arguments at the first call site."
If a caller has init() (0 args, local helper) textually before init(ctx, cfg) (2 args, the real callee), re.search finds the wrong one. The arity mismatch causes return False, rejecting a valid caller-callee edge.
Suggestion: Use re.finditer and check ALL call sites — accept if ANY matches the declared param count:
def _count_call_site_args(function_body: str, func_name: str) -> list[int]:
"""Count arguments at all call sites of func_name(...) in function_body."""
pattern = re.compile(r'\b' + re.escape(func_name) + r'\s*\(')
counts = []
for m in pattern.finditer(function_body):
count = _count_args_from_match(function_body, m)
if count is not None:
counts.append(count)
return countsThen in the caller: if declared not in call_arg_counts: return False
| norm_module = module_path.rstrip("/") | ||
| if dir_path == norm_module or dir_path.startswith(norm_module + "/"): | ||
| return dir_path |
There was a problem hiding this comment.
Medium: _go_subpackage_flow_control doesn't catch ValueError from get_function_name, unlike python_flow_control fixed in the same PR
This PR adds try/except ValueError: continue to python_flow_control (line ~188 of the diff) but the new _go_subpackage_flow_control calls get_function_name unguarded:
function_name = self.lang_parser.get_function_name(doc)Go's get_function_name raises ValueError on malformed function headers (line 416 in golang_functions_parsers.py: raise ValueError(f"Invalid function header")). A single malformed document crashes the entire locate_functions call.
Suggestion: Add the same guard:
for doc in package_docs:
if self.lang_parser:
try:
function_name = self.lang_parser.get_function_name(doc)
except ValueError:
continue
if function_name:
func_to_docs[function_name].append(doc)| if not path.endswith(".go"): | ||
| continue |
There was a problem hiding this comment.
| if not path.endswith(".go"): | |
| continue | |
| go_func_parser=GoLanguageFunctionsParser() | |
| extensions=go_func_parser.supported_files_extensions() | |
| if not any (path.endswith(ext) for ext in extensions): | |
| continue |
Java CCA
_can_reference_class()import-based pre-filter (simple class name, wildcard import, same package, same artifact) to eliminate irrelevant uber-JAR candidates before expensive type resolutionget_relevant_documentsto prevent infinite loops from self-recursive or mutually recursive method callslogger.debugin__check_identifier_resolved_to_callee_function_packagefrom f-strings to%slazy formattingC CCA
search_for_called_functionto reject cross-package false positives where a same-named function has different arity (e.g. rsyncread_byte(1)vs PostgreSQLread_byte(3))Go CCA
_extract_go_subpackage,_go_subpackage_flow_control)resolve_subpackage_to_modulein Go parser and tree fallback inchain_of_calls_retrieverintel_utils(extract_go_subpackages_from_patch)github.com/lib/foo/barmatches targetgithub.com/lib/fooshort_namedict collision: store list of packages per short name instead of overwritingCCA / all ecosystems
chain_of_calls_retriever.__init__(sort_docs,_root_docs,_source_path_index) for O(1) lookups inget_possible_docsinstead of scanning all documents_is_doc_excludedto compare source path (cheap) before page content (expensive)tree_dictto prevent duplicate entries from dependency tree buildercall_hierarchy_listso agent distinguishes missing function from unreachable functiondirect_parentsin__find_caller_function_dfsso root-level packages are searched before library packages, fixing nondeterministic JS transitive search timeoutsJS parser fixes
get_function_nameregex searches to first 2000 chars to prevent catastrophic backtracking on huge functionsis_comment_lineto handle*-prefixed JSDoc continuation lines vs generator*method()syntax_extract_class_nameregex to support$in JS identifiers (\w+→[\w$]+)resolve_chainbackreference[^\1]in string pattern (changed to.*?)is_multilineparameter from_parse_declarationsRPM checker
TARGET_IN_VULNERABLE_RANGEinVulnerabilityIntel.format_for_prompt()so the L1 agent sees the field referenced by the version-based fallback rulescve_verify_vuln_packageLLM response instruction to one sentenceConfig scanner
_CONFIG_DIR_ALLOWED_EXTENSIONSallowlist for directory-matched files, filtering out.js/.css/.map/.java/.pyfrom config collectionpyproject.toml,setup.cfg,tox.ini,tsconfig.json,.eslintrc.json,Makefile,CMakeLists.txt,meson.buildSource code bug fixes (~40)
dep_tree.pymissing comma causing--pythonvenv_pythonstring concatenationdep_tree.pyC/C++detect_ecosystemwalk not using_WALK_EXCLUDE_DIRSdep_tree.pyremove unused_ensure_venvmethoddep_tree.pydeduplicatetypes-package candidates viadict.fromkeysc_segmenter_custom.pyremove_commentsstripping patterns inside string literalsc_lang_function_parsers.pydebug print statement left in production codegolang_functions_parsers.pylen(declaration_parts) == (2 or 3)always-true comparisongolang_functions_parsers.pyno-opre.search("")callgolang_functions_parsers.pyis_package_importedraw string split, missing quote stripping, and unescaped identifier in regexgolang_functions_parsers.pyis_same_packagecrash on empty inputpython_functions_parser.pyis_same_packagereturning True for two empty stringspython_segmenters_with_classes_methods.pyannotating all methods with last class name and skippingasync defmethodssource_code_git_loader.pysafe.directoryguard unnecessarily gated onclone_urlbrew_downloader.pyreturning path with zero downloads and extracting SRPM from cache path instead of target pathconfiguration_scanner.pyre.matchallowing partial filename matches (fullmatch),max_results=0returning 1 result, cache race condition on concurrent eviction, and missingdocker-compose*.yamlpatternimport_usage_analyzer.pyemptyshort_namematching everythingasync_http_utils.pyoff-by-one in retry count, consumer errors caught by retry loop,retry_on_client_errorsoverridden by Retry-After check, negative sleep from pastX-RateLimit-Reset, and missing@functools.wrapsfunction_name_locator.pypython_flow_controlcrash on non-function documents, Go versioned module short-name collision, andget_function_nameValueError exception handlinggit_commit_searcher.py_rank_resultsmutating confidence in-placegit_repo_manager.pydouble-wrappingGitCommandErrorintel_utils.pyparse_cpecheckingsplit_cpe[5]instead ofsplit_cpe[10]for systemllm_engine_utils.pyassert Falsein production code replaced withRuntimeErrorrepo_resolver.pycase-sensitivity bug innormalize_package_namedropping original case for mixed-case JSON keysserp_api_wrapper.pykey index not reset after full rotation, deadmax_retriesfield, and callers passing removed fieldcsaf_generator.pyGHSA description dropped when no pre-existing note and notes appended withtext: Noneweb_patch_fetcher.pymissing asyncio import,_is_commit_urlfalse positive on/c/outside kernel.org, dropping Gitiles commit URLs, yarl double-encoding%5E%21, and rewrite_fetch_gitiles_patchfrom sync requests to async aiohttpprompting.pybuild_tool_descriptionsmissing FL, CONFIG, IUA, GREP entriesjava_functions_parsers.py_count_call_argstreating<comparison as generic bracketBuild & infra
ThreadPoolExecutorwith cgroup-aware worker count-Dmaven.artifact.threads=10to allmvn dependency:copy-dependenciesinvocations indep_tree.pyMAVEN_OPTSwith-Dmaven.repo.localon shared PVC inon-cm-runner.yaml,on-pull-request.yaml, andexploit_iq_service.yamlGOCACHEenv var toon-cm-runner.yaml,on-pull-request.yaml,exploit_iq_service.yamlTests
tests/location_can_reference_class,function_called_from_caller_body,__find_caller_function), JS parser/segmenter (backtracking cap, comment line,$identifiers), C parser (argument-count filter,find_top_level_blocks), Go sub-package (FL, CCA, intel enrichment), RPM checker (format_for_prompt, VERSION GUARD), config scanner (allowlist, build patterns), and tools (FL, IUA, SerpAPI, git, async HTTP, web patch fetcher)