Skip to content

chore: add version scanner to CI#17424

Closed
chalmerlowe wants to merge 51 commits into
mainfrom
feat/add-version-scanner
Closed

chore: add version scanner to CI#17424
chalmerlowe wants to merge 51 commits into
mainfrom
feat/add-version-scanner

Conversation

@chalmerlowe

Copy link
Copy Markdown
Contributor

Note

This is a WIP to test the use of the version scanner to provide protection against regressions that add back in references to recently removed EOL runtimes OR dependencies.

Adds distinct exit codes, --stdout support, workflow file, and fixes 3.10 truncation.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an Automated Dependency Version Scanner tool, which includes the core scanner script, a benchmarking utility, regex configuration rules, and a comprehensive test suite. The reviewer feedback highlights several key areas for improvement: resolving .scannerignore paths relative to the target directory being scanned, supporting wildcard matching in ignore patterns using fnmatch, cleaning up redundant code in the --stdout output block, and making file paths in integration and benchmark tests robust by resolving them relative to __file__.

Comment on lines +648 to +651
# Load ignore file from script directory (Option A)
script_dir = os.path.dirname(os.path.abspath(__file__))
ignore_file_path = os.path.join(script_dir, ".scannerignore")
ignore_dirs = load_ignore_file(ignore_file_path)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The README states that users can create a .scannerignore file in the directory being scanned (usually the repo root). However, the current implementation only loads .scannerignore from the script's directory (scripts/version_scanner/), meaning any .scannerignore placed in the target directory (e.g., the repository root) will be ignored.

We should update this to load .scannerignore from the target path (args.path), and fall back to the script's directory if not found.

Suggested change
# Load ignore file from script directory (Option A)
script_dir = os.path.dirname(os.path.abspath(__file__))
ignore_file_path = os.path.join(script_dir, ".scannerignore")
ignore_dirs = load_ignore_file(ignore_file_path)
# Load ignore file from the target directory being scanned, falling back to the script directory
ignore_file_path = os.path.join(args.path, ".scannerignore")
if not os.path.exists(ignore_file_path):
script_dir = os.path.dirname(os.path.abspath(__file__))
ignore_file_path = os.path.join(script_dir, ".scannerignore")
ignore_dirs = load_ignore_file(ignore_file_path)

Comment on lines +684 to +691
if args.stdout:
print("\n=== CSV Output ===")
import io
output = io.StringIO()
write_csv_report(output_path, all_matches, github_repo=args.github_repo, branch=args.branch) # this writes to the file, but we want stdout too
# Let's just read the file and print it
with open(output_path, 'r', encoding='utf-8') as f:
print(f.read(), end='')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If --stdout is enabled, the script currently imports io, creates an unused io.StringIO() object, and redundantly calls write_csv_report a second time (it was already called on line 679). We can simplify this block by removing the unused import, the unused variable, and the redundant function call, and simply read and print the already generated CSV file.

Suggested change
if args.stdout:
print("\n=== CSV Output ===")
import io
output = io.StringIO()
write_csv_report(output_path, all_matches, github_repo=args.github_repo, branch=args.branch) # this writes to the file, but we want stdout too
# Let's just read the file and print it
with open(output_path, 'r', encoding='utf-8') as f:
print(f.read(), end='')
if args.stdout:
print("\n=== CSV Output ===")
# Let's just read the file and print it
with open(output_path, 'r', encoding='utf-8') as f:
print(f.read(), end='')
References
  1. Remove duplicate lines of code, especially duplicate assertions in tests, to keep the codebase clean and avoid redundancy.

Comment on lines +476 to +480
# Prune ignore directories (case-insensitive)
dirs[:] = [d for d in dirs if d.lower() not in ignore_lower]

# Filter ignore files (case-insensitive)
files = [f for f in files if f.lower() not in ignore_lower]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The .scannerignore file contains wildcard patterns (e.g., *.jpg, *.png). However, the current filtering logic uses direct string membership (not in ignore_lower), which does not support wildcards or globbing. To support wildcard matching as described in the README, we can use fnmatch.fnmatchcase from the standard library.

Note: You will also need to add import fnmatch at the top of the file.

Suggested change
# Prune ignore directories (case-insensitive)
dirs[:] = [d for d in dirs if d.lower() not in ignore_lower]
# Filter ignore files (case-insensitive)
files = [f for f in files if f.lower() not in ignore_lower]
# Prune ignore directories (case-insensitive)
dirs[:] = [d for d in dirs if not any(fnmatch.fnmatchcase(d.lower(), pattern) for pattern in ignore_lower)]
# Filter ignore files (case-insensitive)
files = [f for f in files if not any(fnmatch.fnmatchcase(f.lower(), pattern) for pattern in ignore_lower)]

Comment on lines +21 to +26
# Paths to real tools
scanner_path = os.path.abspath("version_scanner.py")
config_path = os.path.abspath("regex_config.yaml")

# Static data directory
data_dir = os.path.abspath("tests/data")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The integration test currently uses relative paths like version_scanner.py and tests/data assuming the test is run from the scripts/version_scanner/ directory. If the tests are run from the repository root (which is common in CI/CD environments), these paths will resolve incorrectly and the test will fail.

We can make the paths robust and directory-independent by resolving them relative to __file__.

Suggested change
# Paths to real tools
scanner_path = os.path.abspath("version_scanner.py")
config_path = os.path.abspath("regex_config.yaml")
# Static data directory
data_dir = os.path.abspath("tests/data")
# Paths to real tools resolved relative to this test file
test_dir = os.path.dirname(os.path.abspath(__file__))
scanner_path = os.path.abspath(os.path.join(test_dir, "../../version_scanner.py"))
config_path = os.path.abspath(os.path.join(test_dir, "../../regex_config.yaml"))
# Static data directory
data_dir = os.path.abspath(os.path.join(test_dir, "../data"))

(packages_dir / "pkg1").mkdir()
(packages_dir / "pkg1" / "test.py").write_text("version = '3.7'\n")

scanner_path = "version_scanner.py"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the integration test, using a hardcoded relative path "version_scanner.py" will cause the test to fail if executed from any directory other than scripts/version_scanner/. We should resolve the path relative to __file__ to make the test robust.

Suggested change
scanner_path = "version_scanner.py"
test_dir = os.path.dirname(os.path.abspath(__file__))
scanner_path = os.path.abspath(os.path.join(test_dir, "../../version_scanner.py"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant