chore: add version scanner to CI#17424
Conversation
…boundaries for explicit_version_string
… to .scannerignore
…and package-file details
…ayout-agnostic package naming
Adds distinct exit codes, --stdout support, workflow file, and fixes 3.10 truncation.
There was a problem hiding this comment.
Code Review
This pull request introduces an Automated Dependency Version Scanner tool, which includes the core scanner script, a benchmarking utility, regex configuration rules, and a comprehensive test suite. The reviewer feedback highlights several key areas for improvement: resolving .scannerignore paths relative to the target directory being scanned, supporting wildcard matching in ignore patterns using fnmatch, cleaning up redundant code in the --stdout output block, and making file paths in integration and benchmark tests robust by resolving them relative to __file__.
| # Load ignore file from script directory (Option A) | ||
| script_dir = os.path.dirname(os.path.abspath(__file__)) | ||
| ignore_file_path = os.path.join(script_dir, ".scannerignore") | ||
| ignore_dirs = load_ignore_file(ignore_file_path) |
There was a problem hiding this comment.
The README states that users can create a .scannerignore file in the directory being scanned (usually the repo root). However, the current implementation only loads .scannerignore from the script's directory (scripts/version_scanner/), meaning any .scannerignore placed in the target directory (e.g., the repository root) will be ignored.
We should update this to load .scannerignore from the target path (args.path), and fall back to the script's directory if not found.
| # Load ignore file from script directory (Option A) | |
| script_dir = os.path.dirname(os.path.abspath(__file__)) | |
| ignore_file_path = os.path.join(script_dir, ".scannerignore") | |
| ignore_dirs = load_ignore_file(ignore_file_path) | |
| # Load ignore file from the target directory being scanned, falling back to the script directory | |
| ignore_file_path = os.path.join(args.path, ".scannerignore") | |
| if not os.path.exists(ignore_file_path): | |
| script_dir = os.path.dirname(os.path.abspath(__file__)) | |
| ignore_file_path = os.path.join(script_dir, ".scannerignore") | |
| ignore_dirs = load_ignore_file(ignore_file_path) |
| if args.stdout: | ||
| print("\n=== CSV Output ===") | ||
| import io | ||
| output = io.StringIO() | ||
| write_csv_report(output_path, all_matches, github_repo=args.github_repo, branch=args.branch) # this writes to the file, but we want stdout too | ||
| # Let's just read the file and print it | ||
| with open(output_path, 'r', encoding='utf-8') as f: | ||
| print(f.read(), end='') |
There was a problem hiding this comment.
If --stdout is enabled, the script currently imports io, creates an unused io.StringIO() object, and redundantly calls write_csv_report a second time (it was already called on line 679). We can simplify this block by removing the unused import, the unused variable, and the redundant function call, and simply read and print the already generated CSV file.
| if args.stdout: | |
| print("\n=== CSV Output ===") | |
| import io | |
| output = io.StringIO() | |
| write_csv_report(output_path, all_matches, github_repo=args.github_repo, branch=args.branch) # this writes to the file, but we want stdout too | |
| # Let's just read the file and print it | |
| with open(output_path, 'r', encoding='utf-8') as f: | |
| print(f.read(), end='') | |
| if args.stdout: | |
| print("\n=== CSV Output ===") | |
| # Let's just read the file and print it | |
| with open(output_path, 'r', encoding='utf-8') as f: | |
| print(f.read(), end='') |
References
- Remove duplicate lines of code, especially duplicate assertions in tests, to keep the codebase clean and avoid redundancy.
| # Prune ignore directories (case-insensitive) | ||
| dirs[:] = [d for d in dirs if d.lower() not in ignore_lower] | ||
|
|
||
| # Filter ignore files (case-insensitive) | ||
| files = [f for f in files if f.lower() not in ignore_lower] |
There was a problem hiding this comment.
The .scannerignore file contains wildcard patterns (e.g., *.jpg, *.png). However, the current filtering logic uses direct string membership (not in ignore_lower), which does not support wildcards or globbing. To support wildcard matching as described in the README, we can use fnmatch.fnmatchcase from the standard library.
Note: You will also need to add import fnmatch at the top of the file.
| # Prune ignore directories (case-insensitive) | |
| dirs[:] = [d for d in dirs if d.lower() not in ignore_lower] | |
| # Filter ignore files (case-insensitive) | |
| files = [f for f in files if f.lower() not in ignore_lower] | |
| # Prune ignore directories (case-insensitive) | |
| dirs[:] = [d for d in dirs if not any(fnmatch.fnmatchcase(d.lower(), pattern) for pattern in ignore_lower)] | |
| # Filter ignore files (case-insensitive) | |
| files = [f for f in files if not any(fnmatch.fnmatchcase(f.lower(), pattern) for pattern in ignore_lower)] |
| # Paths to real tools | ||
| scanner_path = os.path.abspath("version_scanner.py") | ||
| config_path = os.path.abspath("regex_config.yaml") | ||
|
|
||
| # Static data directory | ||
| data_dir = os.path.abspath("tests/data") |
There was a problem hiding this comment.
The integration test currently uses relative paths like version_scanner.py and tests/data assuming the test is run from the scripts/version_scanner/ directory. If the tests are run from the repository root (which is common in CI/CD environments), these paths will resolve incorrectly and the test will fail.
We can make the paths robust and directory-independent by resolving them relative to __file__.
| # Paths to real tools | |
| scanner_path = os.path.abspath("version_scanner.py") | |
| config_path = os.path.abspath("regex_config.yaml") | |
| # Static data directory | |
| data_dir = os.path.abspath("tests/data") | |
| # Paths to real tools resolved relative to this test file | |
| test_dir = os.path.dirname(os.path.abspath(__file__)) | |
| scanner_path = os.path.abspath(os.path.join(test_dir, "../../version_scanner.py")) | |
| config_path = os.path.abspath(os.path.join(test_dir, "../../regex_config.yaml")) | |
| # Static data directory | |
| data_dir = os.path.abspath(os.path.join(test_dir, "../data")) |
| (packages_dir / "pkg1").mkdir() | ||
| (packages_dir / "pkg1" / "test.py").write_text("version = '3.7'\n") | ||
|
|
||
| scanner_path = "version_scanner.py" |
There was a problem hiding this comment.
Similar to the integration test, using a hardcoded relative path "version_scanner.py" will cause the test to fail if executed from any directory other than scripts/version_scanner/. We should resolve the path relative to __file__ to make the test robust.
| scanner_path = "version_scanner.py" | |
| test_dir = os.path.dirname(os.path.abspath(__file__)) | |
| scanner_path = os.path.abspath(os.path.join(test_dir, "../../version_scanner.py")) |
Note
This is a WIP to test the use of the version scanner to provide protection against regressions that add back in references to recently removed EOL runtimes OR dependencies.