chore: Adds version scanner CI/CD upgrades#17425
Conversation
Adds distinct exit codes, --stdout support, workflow file, and fixes 3.10 truncation.
There was a problem hiding this comment.
Code Review
This pull request introduces negative lookahead patterns to prevent version truncation bugs (e.g., matching 3.10 as 3.1), adds Excel-compatible formatting for matched strings, introduces a --stdout option, and updates exit codes for CI/CD integration. Feedback from the reviewer recommends restricting the Excel-specific wrapping to numeric/version strings to avoid formula errors, using pytest.raises for cleaner test assertions, and removing unused imports and redundant file writes in the stdout logic.
| $CSV_PREVIEW | ||
| \`\`\` | ||
| *(If there are more than 50 matches, see the workflow logs for the full list)*" | ||
|
|
There was a problem hiding this comment.
The following is a prototype of enabling the creation of an issue if/when future regressions are found, since the scans are intended to be run nightly OR as a post-submit (exact cadence is TBD during a later phase of the project).
| @@ -186,61 +186,100 @@ def scan_file(file_path: str, compiled_rules: List[Dict[str, re.Pattern]]) -> Li | |||
| return results | |||
|
|
|||
|
|
|||
There was a problem hiding this comment.
The following new and updated functions pull certain formatting logic out and isolate it so that we have a separation of concerns between building the output and formatting the output depending on where it goes:
- slim logs on stdout in CI/CD
- raw csv in both CI/CD and local use
- gSheets for local use
| - main | ||
| - '**version-scanner**' | ||
| schedule: | ||
| - cron: '0 * * * *' # Run hourly at the top of the hour |
There was a problem hiding this comment.
Currently set to run hourly (which will take effect once it gets merged to main) to help facilitate the development cycle for some upcoming features. Once development is done, we can set this to an appropriate cadence.
There was a problem hiding this comment.
What do you expect to use long term? Does it make sense to run both on a schedule, and on each commit?
daniel-sanche
left a comment
There was a problem hiding this comment.
LGTM to merge, but left a few questions about the longer term plans
| - main | ||
| - '**version-scanner**' | ||
| schedule: | ||
| - cron: '0 * * * *' # Run hourly at the top of the hour |
There was a problem hiding this comment.
What do you expect to use long term? Does it make sense to run both on a schedule, and on each commit?
| # Uses -o to output a detailed, raw CSV to a file | ||
| # Uses --stdout to print a slim, easier to parse summary to the GitHub Actions UI | ||
| # Uses --soft-fail to temporarily limit causing CI/CD failures during the migration to full operation. | ||
| python scripts/version_scanner/version_scanner.py -d python -v 3.7 --stdout -o version_scanner_output.csv --soft-fail |
There was a problem hiding this comment.
I saw in the output, it is looking for 3.7. Is this where that is configured? Can that be an envvar/argument?
Why search for just 3.7 specifically? Should we be checking for all outdated versions?
| run: | | ||
| # Uses -o to output a detailed, raw CSV to a file | ||
| # Uses --stdout to print a slim, easier to parse summary to the GitHub Actions UI | ||
| # Uses --soft-fail to temporarily limit causing CI/CD failures during the migration to full operation. |
There was a problem hiding this comment.
Is the plan to resolve/ignore the current alerts, and then remove --soft-fail?
There was a problem hiding this comment.
Regarding long term plans:
Q1: I do not feel that this issue is critical enough that a presubmit is required. I feel a nightly OR a post-submit is adequate so as to not slow down the normal PR process. The intent is to try to get the kinks worked out, confirm what type of burden this has on performance, and discuss with the team to reach a firm decision on when/how often to run the check, but we are not ready for that conversation yet.
Q2: This is a prototype implementation. The OG version_scanner only accepts one dependency and one version at a time. The implementation plan is to update it so that you can provide a list of runtimes OR dependencies and pair them with a list of versions:
i.e.
python 3.7, 3.8, 3.9 etc
protobuf 4.28.5, 5.16.7
Whatever is needed/whatever the most recent deprecations may be.
Q3: Yes, the plan is to mitigate any existing issues during this migration phase and then disable the --soft-fail in the workflow. Right now we have a number of false positives. We have a few true positives that might have slipped through the cracks. I wanna minimize any kerfuffle when this goes live.
Summary of Changes
This PR contains updates to the automated dependency version scanner tool and its associated CI/CD workflow to support decoupled formatting, clean console logs, and advisory (non-signalling) runs during rollout.
1. GitHub Actions (GHA) Workflow Modernization
mainand any branch matching'**version-scanner**'workflow_dispatchbutton in the GHA tab to simplify ad hoc testing and demos during development.2. Scanner Script Refactoring (Decoupled Formatters)
format_for_raw_csv: Generates clean, unformatted raw data for CSV reporting.format_for_spreadsheet: Wraps matches with Google Sheets formulas (such asHYPERLINKand string quotes to prevent float truncation) for Google Sheets upload.format_for_console: Prepares a slim, readable console string for stdout/logs (especially GHA logs).3. Output Simplification
4. Advisory Runs (
--soft-fail)--soft-failCLI flag to the python script to allow it to exit with code0even if version matches are found (allowing the scan to run and report findings in the logs without failing the GHA check and blocking merges during development and prototyping phases).--soft-failin the GHA workflow for now to support development.