Migrate commit-atomicity.yml to Rust (atomicity-checker)#720
Migrate commit-atomicity.yml to Rust (atomicity-checker)#720
Conversation
- Created `atomicity-checker` tool in `tools/atomicity-checker` - Implemented Git analysis using `git2` and file categorization using `globset` - Added CLI subcommands: `check`, `report`, `validate` - Implemented Terminal, JSON, and Markdown reporting - Created `.github/workflows/commit-atomicity.yml` to use the new tool - Updated `.github/workflows/build-tools.yml` to include the new tool in the build pipeline and pre-built binaries - Added unit tests for categorization logic This migration improves CI performance and maintainability by replacing complex shell scripts with a high-performance Rust implementation. Co-authored-by: iberi22 <10615454+iberi22@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request migrates the existing shell-based commit atomicity check to a new, highly performant native Rust tool named Highlights
Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughThis PR introduces a complete Rust-based reimplementation of the commit atomicity checker, migrating from shell scripting to a native tool. Changes include new CI workflows for building and executing the checker, workspace configuration updates, and a multi-module Rust implementation with Git integration, YAML configuration, commit analysis logic, and multi-format reporting capabilities. Changes
Sequence DiagramsequenceDiagram
participant CLI as CLI (main.rs)
participant Config as Config Loader
participant Git as Git Context
participant Checker as Atomicity Checker
participant Report as Report Generator
participant Output as Terminal/JSON/Markdown
CLI->>Config: load(config_path)
Config-->>CLI: Config{concern_patterns, bot_patterns, ...}
CLI->>Git: open(repo_path)
Git-->>CLI: GitContext
CLI->>Git: get_commits_in_range(range)
Git-->>CLI: Vec<Oid>
loop for each commit
CLI->>Git: get_commit_details(oid)
Git-->>CLI: (message, author)
CLI->>Git: get_changed_files(oid)
Git-->>CLI: Vec<String>
alt skip if bot
CLI->>Checker: is_bot(author)
Checker-->>CLI: true
else analyze
CLI->>Checker: analyze_commit(sha, message, author, files)
Checker->>Checker: categorize_file() per file
Checker->>Checker: compute concerns & is_atomic
Checker-->>CLI: CommitInfo
end
end
CLI->>Report: generate_markdown(AnalysisResult)
Report-->>CLI: String
CLI->>Output: print terminal/JSON/Markdown
Output-->>Output: display result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Rationale: Multiple interconnected source files with substantial logic density (git operations, pattern matching, atomicity analysis), new public API surface across six modules, and heterogeneous change patterns across configuration, CLI, analysis, and reporting layers. Integration points between Git context, configuration, checker, and report generation require careful validation for correctness. Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request successfully migrates the shell-based commit atomicity check to a native Rust tool, significantly improving performance and maintainability by leveraging clap for the CLI and git2 for repository analysis. However, a critical security vulnerability was identified in the Markdown report generation, where untrusted commit messages can inject arbitrary Markdown content, potentially leading to spoofing or phishing attacks in PR comments. Addressing this vulnerability is paramount. Additionally, the review suggests improvements for code idiomaticity, dependency management for better reproducibility, overall maintainability, and to resolve a minor functional regression.
| "| {} | `{}` | {} | {} |\n", | ||
| status, | ||
| short_sha, | ||
| commit.message.replace('|', "\\|"), |
There was a problem hiding this comment.
The generate_markdown function constructs a Markdown table using untrusted commit messages. While it escapes the pipe character |, it does not escape or remove newlines \n. A commit message containing newlines will break the Markdown table structure, allowing an attacker to inject arbitrary Markdown content (such as headers, links, or additional text) below the table. This could be used to spoof the report's results or conduct phishing attacks if the report is rendered in a browser (e.g., as a GitHub PR comment).
To remediate this, ensure that newlines in the commit message are replaced with spaces or otherwise escaped before being included in the Markdown table.
| commit.message.replace('|', "\\|"), | |
| commit.message.replace('|', "\\|").replace('\n', " ").replace('\r', ""), |
| serde_yaml = "0.9" | ||
| regex = { workspace = true } | ||
| anyhow = { workspace = true } | ||
| colored = "2" | ||
| git2 = "0.19" | ||
| globset = "0.4" |
There was a problem hiding this comment.
For better build reproducibility and to avoid potential issues from transitive dependencies, it's a good practice to pin your dependencies to more specific versions rather than using broad version requirements.
| serde_yaml = "0.9" | |
| regex = { workspace = true } | |
| anyhow = { workspace = true } | |
| colored = "2" | |
| git2 = "0.19" | |
| globset = "0.4" | |
| serde_yaml = "0.9.34" | |
| regex = { workspace = true } | |
| anyhow = { workspace = true } | |
| colored = "2.1.0" | |
| git2 = "0.19.0" | |
| globset = "0.4.14" |
| for re in &self.bot_regexes { | ||
| if re.is_match(author) { | ||
| return true; | ||
| } | ||
| } | ||
| false |
| // Fallback categorization logic if not matched by config | ||
| if path.contains("test") || path.contains("spec") { | ||
| return "tests".to_string(); | ||
| } | ||
| if path.ends_with(".md") { | ||
| return "docs".to_string(); | ||
| } |
There was a problem hiding this comment.
The fallback categorization logic appears to be a regression compared to the original shell script it replaces. The script had more comprehensive checks for test files (e.g., _test.rs, *.test.rs) and documentation files (e.g., README). Consider making this logic more robust to match the previous implementation, or ideally, making these fallback patterns configurable as well.
| impl Config { | ||
| pub fn load<P: AsRef<Path>>(path: P) -> Result<Self> { | ||
| let content = fs::read_to_string(path)?; | ||
| let config: Config = serde_yaml::from_str(&content)?; | ||
| Ok(config) | ||
| } | ||
|
|
||
| pub fn default() -> Self { | ||
| Config { | ||
| enabled: true, | ||
| mode: "warning".to_string(), | ||
| ignore_bots: true, | ||
| bot_patterns: vec![], | ||
| max_concerns: 1, | ||
| concern_patterns: HashMap::new(), | ||
| exclude_patterns: vec![], | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Instead of a standalone default() method, it's more idiomatic in Rust to implement the Default trait for the Config struct. This allows for usage like Config::default() and in struct initializations with ..Default::default(), improving consistency with the Rust ecosystem.
impl Config {
pub fn load<P: AsRef<Path>>(path: P) -> Result<Self> {
let content = fs::read_to_string(path)?;
let config: Config = serde_yaml::from_str(&content)?;
Ok(config)
}
}
impl Default for Config {
fn default() -> Self {
Self {
enabled: true,
mode: "warning".to_string(),
ignore_bots: true,
bot_patterns: vec![],
max_concerns: 1,
concern_patterns: HashMap::new(),
exclude_patterns: vec![],
}
}
}| let mut oids = Vec::new(); | ||
| for oid in revwalk { | ||
| oids.push(oid?); | ||
| } | ||
| Ok(oids) |
There was a problem hiding this comment.
This loop for collecting OIDs can be written more idiomatically and concisely using collect() on the iterator. This is a common pattern in Rust for collecting items from an iterator that returns Results into a collection.
| let mut oids = Vec::new(); | |
| for oid in revwalk { | |
| oids.push(oid?); | |
| } | |
| Ok(oids) | |
| revwalk.collect() |
| fn run_report(args: &Args, format: String) -> Result<()> { | ||
| let mut args_mod = Args { | ||
| command: None, | ||
| json: args.json, | ||
| markdown: args.markdown, | ||
| config: args.config.clone(), | ||
| range: args.range.clone(), | ||
| }; | ||
|
|
||
| match format.as_str() { | ||
| "json" => { | ||
| args_mod.json = true; | ||
| args_mod.markdown = false; | ||
| } | ||
| "markdown" => { | ||
| args_mod.json = false; | ||
| args_mod.markdown = true; | ||
| } | ||
| _ => { | ||
| args_mod.json = false; | ||
| args_mod.markdown = false; | ||
| } | ||
| } | ||
| run_check(&args_mod, None, None, None) | ||
| } |
There was a problem hiding this comment.
The run_report function re-implements argument parsing by creating and modifying a new Args struct just to call run_check. This creates unnecessary complexity and coupling between the two functions.
A cleaner approach would be to define an OutputFormat enum and pass it to run_check. This would decouple the core logic from the argument parsing structure.
For example:
enum OutputFormat {
Terminal,
Json,
Markdown,
}
// In run_check
fn run_check(..., output_format: OutputFormat) -> Result<()> {
// ...
match output_format {
OutputFormat::Json => { /* ... */ },
OutputFormat::Markdown => { /* ... */ },
OutputFormat::Terminal => { /* ... */ },
}
}| use crate::config::Config; | ||
| use colored::*; | ||
|
|
||
| pub fn print_terminal(result: &AnalysisResult, _config: &Config) { |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (4)
tools/atomicity-checker/src/main.rs (2)
22-28: Consider enforcing mutual exclusivity between--jsonand--markdownflags.Both flags can be set simultaneously, but only
--jsontakes effect due to the if-else chain inrun_check. This implicit precedence could confuse users. Clap providesconflicts_withto make this explicit.♻️ Suggested improvement
/// Output as JSON - #[arg(long, global = true)] + #[arg(long, global = true, conflicts_with = "markdown")] json: bool, /// Output as Markdown #[arg(long, global = true)] markdown: bool,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/atomicity-checker/src/main.rs` around lines 22 - 28, Add mutual-exclusion between the json and markdown CLI flags so users cannot pass both; update the struct fields for the flags (the json and markdown fields) to use clap's conflicts_with (e.g., add a conflicts_with reference on json pointing to "markdown" and vice versa) so clap enforces the exclusivity at parse time instead of relying on the run_check if/else behavior.
183-207: Consider simplifyingrun_reportby passing format directly.Constructing a modified
Argsstruct is verbose. Sincerun_checkonly usesjsonandmarkdownflags for output, consider extracting output format as a separate enum parameter.This would reduce coupling and make the intent clearer. Low priority given the current implementation works correctly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/atomicity-checker/src/main.rs` around lines 183 - 207, run_report currently builds a full Args clone just to toggle the json/markdown flags before calling run_check; instead introduce an OutputFormat enum (e.g. Json, Markdown, Default) and change run_check to accept an OutputFormat parameter (or add an overload) so run_report can call run_check(&args, Some(output_format), None, None) / run_check_with_format(args, output_format) without cloning or mutating Args; update the run_report function to map the incoming format string to the new OutputFormat and pass that to run_check, and remove the manual json/markdown assignments on Args (leave Args unchanged).tools/atomicity-checker/src/checker.rs (2)
99-107: Concerns list order is non-deterministic.
HashSet::into_iter().collect()yields elements in arbitrary order. While this doesn't affect correctness, it can cause output diffs between runs, complicating CI caching or output comparison.♻️ Proposed fix for consistent ordering
+ let mut concerns_vec: Vec<_> = concerns.into_iter().collect(); + concerns_vec.sort(); + CommitInfo { sha, message, author, - concerns: concerns.into_iter().collect(), + concerns: concerns_vec, count, is_atomic, files: file_infos, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/atomicity-checker/src/checker.rs` around lines 99 - 107, The concerns field in the CommitInfo construction is built with concerns.into_iter().collect(), which yields a non-deterministic order; to fix, collect into a Vec, sort it deterministically (e.g., sort_unstable()) and then assign that sorted Vec to the CommitInfo.concerns field (or collect into a BTreeSet if you want a sorted set), updating the code around the CommitInfo initializer (reference: the CommitInfo struct and the local variable/iterator named concerns and the into_iter().collect() call).
117-140: Consider expanding test coverage foris_botandanalyze_commit.The existing test validates categorization and exclusion, which is good. Adding tests for bot detection and the full
analyze_commitflow would improve confidence in edge cases.Example additions:
- Test
is_botwith matching/non-matching author patterns- Test
analyze_commitwith multiple files spanning multiple concerns to verify atomicity logic🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/atomicity-checker/src/checker.rs` around lines 117 - 140, Add unit tests covering Checker::is_bot and Checker::analyze_commit: create new tests that instantiate Checker via Checker::new with bot_patterns populated and assert is_bot returns true for matching author strings and false otherwise; and add at least one test that constructs a Config with multiple concern_patterns and files, then calls Checker::analyze_commit (or the public method that runs the atomicity check) with a simulated commit containing multiple changed files to verify the returned analysis exposes multiple concerns, violations, and respects exclude_patterns and max_concerns; use the existing test style in test_categorization to locate where to add tests and reference Checker, is_bot, and analyze_commit in assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/commit-atomicity.yml:
- Around line 33-36: The workflow step that runs the atomicity checker uses the
range string "origin/${{ github.base_ref }}..HEAD" but assumes the base branch
exists locally; update the job to fetch the base ref before running the checker
(e.g., add a git fetch origin "${{ github.base_ref }}" or a broader git fetch
--prune --unshallow origin +refs/heads/${{ github.base_ref
}}:refs/remotes/origin/${{ github.base_ref }}) so the referenced origin/${{
github.base_ref }} is present and the atomicity-checker range resolves
correctly.
In `@tools/atomicity-checker/src/checker.rs`:
- Around line 17-24: The iteration over config.concern_patterns (a HashMap) is
non-deterministic which can change which concern is chosen when a file matches
multiple patterns; make the ordering deterministic by collecting the entries
from config.concern_patterns into a Vec, sorting that Vec by the concern key (or
by a stable comparator), then building concern_globs from that sorted list using
GlobSetBuilder and builder.build() so the first-match logic remains stable;
update the loop that constructs concern_globs (and any subsequent loop that
assigns concerns using those globsets) to use the sorted sequence instead of
iterating the HashMap directly.
- Around line 64-72: The fallback categorization uses case-sensitive substring
checks on the path variable and misses paths like "Tests" or "Spec"; modify the
logic in the block that checks path.contains("test") / path.contains("spec") and
path.ends_with(".md") to operate on a normalized lowercase string (e.g., let lc
= path.to_lowercase()) and use lc.contains("test"), lc.contains("spec"), and
lc.ends_with(".md") so matching becomes case-insensitive while preserving the
same return values ("tests", "docs", "other").
In `@tools/atomicity-checker/src/git.rs`:
- Around line 40-44: The diff iteration currently only uses
delta.new_file().path() inside the closure passed to diff.foreach, so deleted
files (where new_file().path() is None) are skipped; update the closure to fall
back to delta.old_file().path() when new_file().path() is None (e.g., try
new_file().path().or_else(|| old_file().path())) and push that path into files
so both added and deleted paths are captured for classification (refer to
diff.foreach, the closure, delta.new_file(), delta.old_file(), and the files
vector).
In `@tools/atomicity-checker/src/main.rs`:
- Around line 117-126: When git.get_commits_in_range(&args.range) fails the Err
is discarded; change the match to capture the error (e.g., Err(e)) and include e
in the output so callers can see why the range lookup failed. Update the error
branch that currently prints "No commits found..." to include the error context
(e.g., append or log e) when args.json/markdown are false, and still return
Ok(()) afterwards; reference git.get_commits_in_range, args.range and
commits_to_analyze to locate and modify the failing match arm.
In `@tools/atomicity-checker/src/report.rs`:
- Line 10: The code slices commit.sha with [..8] which can panic for short
strings; replace both occurrences where short_sha (and the similar slice at line
72) is created with a safe prefix extraction like
commit.sha.get(..8).unwrap_or(&commit.sha) (or equivalent using chars().take(8))
so you never index out of bounds; update the places that assign to short_sha to
use this safe getter.
- Around line 73-79: The Markdown row builder (md.push_str) currently only
escapes pipe characters in commit.message, but newline characters still break
table rows; update the formatting so commit.message is first normalized to
remove or replace line breaks (e.g., replace '\r' and '\n' with a single space
or '<br>') and then escape '|' before inserting into md.push_str; also apply the
same newline normalization to commit.concerns (or to the string produced by
commit.concerns.join(", ")) so neither field can introduce newlines that break
the markdown table—modify the expression around commit.message and
commit.concerns where used in md.push_str to perform these replacements.
---
Nitpick comments:
In `@tools/atomicity-checker/src/checker.rs`:
- Around line 99-107: The concerns field in the CommitInfo construction is built
with concerns.into_iter().collect(), which yields a non-deterministic order; to
fix, collect into a Vec, sort it deterministically (e.g., sort_unstable()) and
then assign that sorted Vec to the CommitInfo.concerns field (or collect into a
BTreeSet if you want a sorted set), updating the code around the CommitInfo
initializer (reference: the CommitInfo struct and the local variable/iterator
named concerns and the into_iter().collect() call).
- Around line 117-140: Add unit tests covering Checker::is_bot and
Checker::analyze_commit: create new tests that instantiate Checker via
Checker::new with bot_patterns populated and assert is_bot returns true for
matching author strings and false otherwise; and add at least one test that
constructs a Config with multiple concern_patterns and files, then calls
Checker::analyze_commit (or the public method that runs the atomicity check)
with a simulated commit containing multiple changed files to verify the returned
analysis exposes multiple concerns, violations, and respects exclude_patterns
and max_concerns; use the existing test style in test_categorization to locate
where to add tests and reference Checker, is_bot, and analyze_commit in
assertions.
In `@tools/atomicity-checker/src/main.rs`:
- Around line 22-28: Add mutual-exclusion between the json and markdown CLI
flags so users cannot pass both; update the struct fields for the flags (the
json and markdown fields) to use clap's conflicts_with (e.g., add a
conflicts_with reference on json pointing to "markdown" and vice versa) so clap
enforces the exclusivity at parse time instead of relying on the run_check
if/else behavior.
- Around line 183-207: run_report currently builds a full Args clone just to
toggle the json/markdown flags before calling run_check; instead introduce an
OutputFormat enum (e.g. Json, Markdown, Default) and change run_check to accept
an OutputFormat parameter (or add an overload) so run_report can call
run_check(&args, Some(output_format), None, None) / run_check_with_format(args,
output_format) without cloning or mutating Args; update the run_report function
to map the incoming format string to the new OutputFormat and pass that to
run_check, and remove the manual json/markdown assignments on Args (leave Args
unchanged).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
.github/workflows/build-tools.yml.github/workflows/commit-atomicity.ymlCargo.tomltools/atomicity-checker/Cargo.tomltools/atomicity-checker/src/checker.rstools/atomicity-checker/src/config.rstools/atomicity-checker/src/git.rstools/atomicity-checker/src/main.rstools/atomicity-checker/src/models.rstools/atomicity-checker/src/report.rs
| run: | | ||
| ./target/release/atomicity-checker \ | ||
| --range "origin/${{ github.base_ref }}..HEAD" \ | ||
| --config ".github/atomicity-config.yml" |
There was a problem hiding this comment.
Fetch the base ref before computing origin/<base>..HEAD.
Line 35 assumes origin/${{ github.base_ref }} is present locally. In PR checkouts this can be missing, causing range resolution failures and a false-red workflow.
Suggested workflow patch
- name: 🦀 Setup Rust
uses: dtolnay/rust-toolchain@stable
+ - name: 📥 Fetch base branch ref
+ run: |
+ git fetch --no-tags --prune --depth=1 origin \
+ +refs/heads/${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}
+
- name: 🏗️ Build Atomicity Checker
run: cargo build --release -p atomicity-checker🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/commit-atomicity.yml around lines 33 - 36, The workflow
step that runs the atomicity checker uses the range string "origin/${{
github.base_ref }}..HEAD" but assumes the base branch exists locally; update the
job to fetch the base ref before running the checker (e.g., add a git fetch
origin "${{ github.base_ref }}" or a broader git fetch --prune --unshallow
origin +refs/heads/${{ github.base_ref }}:refs/remotes/origin/${{
github.base_ref }}) so the referenced origin/${{ github.base_ref }} is present
and the atomicity-checker range resolves correctly.
| let mut concern_globs = Vec::new(); | ||
| for (concern, patterns) in &config.concern_patterns { | ||
| let mut builder = GlobSetBuilder::new(); | ||
| for pattern in patterns { | ||
| builder.add(Glob::new(pattern)?); | ||
| } | ||
| concern_globs.push((concern.clone(), builder.build()?)); | ||
| } |
There was a problem hiding this comment.
Non-deterministic iteration order may cause inconsistent categorization.
config.concern_patterns is a HashMap, and iterating over it yields entries in arbitrary order. If a file matches multiple concern patterns, the first match wins (line 59-61), leading to potentially different results across runs.
🔧 Proposed fix to ensure deterministic ordering
+ let mut sorted_concerns: Vec<_> = config.concern_patterns.iter().collect();
+ sorted_concerns.sort_by_key(|(k, _)| k.clone());
+
let mut concern_globs = Vec::new();
- for (concern, patterns) in &config.concern_patterns {
+ for (concern, patterns) in sorted_concerns {
let mut builder = GlobSetBuilder::new();
for pattern in patterns {
builder.add(Glob::new(pattern)?);
}
concern_globs.push((concern.clone(), builder.build()?));
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let mut concern_globs = Vec::new(); | |
| for (concern, patterns) in &config.concern_patterns { | |
| let mut builder = GlobSetBuilder::new(); | |
| for pattern in patterns { | |
| builder.add(Glob::new(pattern)?); | |
| } | |
| concern_globs.push((concern.clone(), builder.build()?)); | |
| } | |
| let mut sorted_concerns: Vec<_> = config.concern_patterns.iter().collect(); | |
| sorted_concerns.sort_by_key(|(k, _)| k.clone()); | |
| let mut concern_globs = Vec::new(); | |
| for (concern, patterns) in sorted_concerns { | |
| let mut builder = GlobSetBuilder::new(); | |
| for pattern in patterns { | |
| builder.add(Glob::new(pattern)?); | |
| } | |
| concern_globs.push((concern.clone(), builder.build()?)); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/atomicity-checker/src/checker.rs` around lines 17 - 24, The iteration
over config.concern_patterns (a HashMap) is non-deterministic which can change
which concern is chosen when a file matches multiple patterns; make the ordering
deterministic by collecting the entries from config.concern_patterns into a Vec,
sorting that Vec by the concern key (or by a stable comparator), then building
concern_globs from that sorted list using GlobSetBuilder and builder.build() so
the first-match logic remains stable; update the loop that constructs
concern_globs (and any subsequent loop that assigns concerns using those
globsets) to use the sorted sequence instead of iterating the HashMap directly.
| // Fallback categorization logic if not matched by config | ||
| if path.contains("test") || path.contains("spec") { | ||
| return "tests".to_string(); | ||
| } | ||
| if path.ends_with(".md") { | ||
| return "docs".to_string(); | ||
| } | ||
|
|
||
| "other".to_string() |
There was a problem hiding this comment.
Fallback categorization is case-sensitive, potentially missing test files.
The substring checks path.contains("test") and path.contains("spec") are case-sensitive. Paths like Tests/, __TESTS__/, or Spec/ would fall through to "other".
🔧 Proposed fix for case-insensitive matching
// Fallback categorization logic if not matched by config
- if path.contains("test") || path.contains("spec") {
+ let path_lower = path.to_lowercase();
+ if path_lower.contains("test") || path_lower.contains("spec") {
return "tests".to_string();
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Fallback categorization logic if not matched by config | |
| if path.contains("test") || path.contains("spec") { | |
| return "tests".to_string(); | |
| } | |
| if path.ends_with(".md") { | |
| return "docs".to_string(); | |
| } | |
| "other".to_string() | |
| // Fallback categorization logic if not matched by config | |
| let path_lower = path.to_lowercase(); | |
| if path_lower.contains("test") || path_lower.contains("spec") { | |
| return "tests".to_string(); | |
| } | |
| if path.ends_with(".md") { | |
| return "docs".to_string(); | |
| } | |
| "other".to_string() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/atomicity-checker/src/checker.rs` around lines 64 - 72, The fallback
categorization uses case-sensitive substring checks on the path variable and
misses paths like "Tests" or "Spec"; modify the logic in the block that checks
path.contains("test") / path.contains("spec") and path.ends_with(".md") to
operate on a normalized lowercase string (e.g., let lc = path.to_lowercase())
and use lc.contains("test"), lc.contains("spec"), and lc.ends_with(".md") so
matching becomes case-insensitive while preserving the same return values
("tests", "docs", "other").
| diff.foreach( | ||
| &mut |delta, _| { | ||
| if let Some(path) = delta.new_file().path() { | ||
| files.push(path.to_string_lossy().into_owned()); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# First, let's see the file structure and find the exact lines
fd git.rs tools/atomicity-checker/src/Repository: iberi22/synapse-protocol
Length of output: 102
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read the relevant section with context
sed -n '35,50p' tools/atomicity-checker/src/git.rsRepository: iberi22/synapse-protocol
Length of output: 552
🌐 Web query:
git2-rs DiffDelta deleted files new_file path old_file path behavior
💡 Result:
In git2-rs, DiffDelta is a thin wrapper over libgit2’s git_diff_delta, where:
old_file()is the “from” side,new_file()is the “to” side. (libgit2.org)- Either side can be absent: old may be absent for an added file; new may be absent for a deleted file. (libgit2.org)
Because of that, DiffFile::path() returns an Option<&Path> (it can be null). (docs.rs)
Practical behavior you’ll see
-
Deleted file delta (
Delta::Deleted):delta.old_file().path()→Some(old_path)delta.new_file().path()→None(the “new” side doesn’t exist)
-
Added file delta (
Delta::Added):old_file().path()→Nonenew_file().path()→Some(new_path)
-
Renamed / Copied: usually both are
Some(...)and differ.
So for “the path of the file in this delta”, common practice is:
let path = delta.old_file().path().or(delta.new_file().path());This pattern is used in real code specifically to handle added/deleted cases. (origen-sdk.org)
Note: if you generate a diff with libgit2’s “reverse” option, the meaning of old/new is flipped. (libgit2.org)
Citations:
- 1: https://libgit2.org/docs/reference/main/diff/git_diff_delta.html?utm_source=openai
- 2: https://libgit2.org/docs/reference/main/diff/git_diff_delta.html?utm_source=openai
- 3: https://docs.rs/git2/latest/git2/struct.DiffFile.html?utm_source=openai
- 4: https://origen-sdk.org/o2/_static/build/rustdoc/origen/doc/src/origen/revision_control/git.rs.html?utm_source=openai
- 5: https://libgit2.org/docs/reference/main/diff/git_diff_delta.html?utm_source=openai
Handle deleted files in diff classification to prevent incomplete atomicity results.
The current code only checks delta.new_file().path(), which returns None for deleted files in git2-rs. This causes deleted files to be omitted from diff classification, leading to incomplete and potentially incorrect atomicity analysis.
For deleted files, delta.old_file().path() contains the path (the "new" side does not exist). Use a fallback pattern to capture both added and deleted files:
Suggested fix
diff.foreach(
&mut |delta, _| {
- if let Some(path) = delta.new_file().path() {
+ let path = delta
+ .new_file()
+ .path()
+ .or_else(|| delta.old_file().path());
+ if let Some(path) = path {
files.push(path.to_string_lossy().into_owned());
}
true
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| diff.foreach( | |
| &mut |delta, _| { | |
| if let Some(path) = delta.new_file().path() { | |
| files.push(path.to_string_lossy().into_owned()); | |
| } | |
| diff.foreach( | |
| &mut |delta, _| { | |
| let path = delta | |
| .new_file() | |
| .path() | |
| .or_else(|| delta.old_file().path()); | |
| if let Some(path) = path { | |
| files.push(path.to_string_lossy().into_owned()); | |
| } | |
| true | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/atomicity-checker/src/git.rs` around lines 40 - 44, The diff iteration
currently only uses delta.new_file().path() inside the closure passed to
diff.foreach, so deleted files (where new_file().path() is None) are skipped;
update the closure to fall back to delta.old_file().path() when
new_file().path() is None (e.g., try new_file().path().or_else(||
old_file().path())) and push that path into files so both added and deleted
paths are captured for classification (refer to diff.foreach, the closure,
delta.new_file(), delta.old_file(), and the files vector).
| match git.get_commits_in_range(&args.range) { | ||
| Ok(oids) => commits_to_analyze = oids, | ||
| Err(_) => { | ||
| if !args.json && !args.markdown { | ||
| println!("{}", "No commits found in range or range invalid. Skipping analysis.".yellow()); | ||
| } | ||
| return Ok(()); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Error context is discarded, making debugging difficult.
When get_commits_in_range fails, the actual error is swallowed. For a CI tool, knowing why a range is invalid (e.g., missing remote, bad ref syntax) aids troubleshooting.
🛠️ Proposed fix to preserve error context
match git.get_commits_in_range(&args.range) {
Ok(oids) => commits_to_analyze = oids,
- Err(_) => {
+ Err(e) => {
if !args.json && !args.markdown {
- println!("{}", "No commits found in range or range invalid. Skipping analysis.".yellow());
+ eprintln!("{}: {} ({})", "Warning".yellow(), "No commits found in range or range invalid. Skipping analysis.", e);
}
return Ok(());
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| match git.get_commits_in_range(&args.range) { | |
| Ok(oids) => commits_to_analyze = oids, | |
| Err(_) => { | |
| if !args.json && !args.markdown { | |
| println!("{}", "No commits found in range or range invalid. Skipping analysis.".yellow()); | |
| } | |
| return Ok(()); | |
| } | |
| } | |
| } | |
| match git.get_commits_in_range(&args.range) { | |
| Ok(oids) => commits_to_analyze = oids, | |
| Err(e) => { | |
| if !args.json && !args.markdown { | |
| eprintln!("{}: {} ({})", "Warning".yellow(), "No commits found in range or range invalid. Skipping analysis.", e); | |
| } | |
| return Ok(()); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/atomicity-checker/src/main.rs` around lines 117 - 126, When
git.get_commits_in_range(&args.range) fails the Err is discarded; change the
match to capture the error (e.g., Err(e)) and include e in the output so callers
can see why the range lookup failed. Update the error branch that currently
prints "No commits found..." to include the error context (e.g., append or log
e) when args.json/markdown are false, and still return Ok(()) afterwards;
reference git.get_commits_in_range, args.range and commits_to_analyze to locate
and modify the failing match arm.
| println!("{}", "=".repeat(50)); | ||
|
|
||
| for commit in &result.commits { | ||
| let short_sha = &commit.sha[..8]; |
There was a problem hiding this comment.
Guard SHA shortening against panics.
Line 10 and Line 72 use [..8] slicing, which can panic if the value is shorter than 8 bytes. Safe prefix extraction avoids runtime crashes on malformed/test data.
Suggested fix
- let short_sha = &commit.sha[..8];
+ let short_sha = commit.sha.get(..8).unwrap_or(&commit.sha);
...
- let short_sha = &commit.sha[..8];
+ let short_sha = commit.sha.get(..8).unwrap_or(&commit.sha);Also applies to: 72-72
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/atomicity-checker/src/report.rs` at line 10, The code slices commit.sha
with [..8] which can panic for short strings; replace both occurrences where
short_sha (and the similar slice at line 72) is created with a safe prefix
extraction like commit.sha.get(..8).unwrap_or(&commit.sha) (or equivalent using
chars().take(8)) so you never index out of bounds; update the places that assign
to short_sha to use this safe getter.
| md.push_str(&format!( | ||
| "| {} | `{}` | {} | {} |\n", | ||
| status, | ||
| short_sha, | ||
| commit.message.replace('|', "\\|"), | ||
| commit.concerns.join(", ") | ||
| )); |
There was a problem hiding this comment.
Sanitize line breaks before writing markdown table rows.
Line 77 escapes |, but newline characters in commit messages can still break table formatting in PR comments.
Suggested fix
- md.push_str(&format!(
+ let safe_message = commit
+ .message
+ .replace('|', "\\|")
+ .replace('\n', " ")
+ .replace('\r', " ");
+ md.push_str(&format!(
"| {} | `{}` | {} | {} |\n",
status,
short_sha,
- commit.message.replace('|', "\\|"),
+ safe_message,
commit.concerns.join(", ")
));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/atomicity-checker/src/report.rs` around lines 73 - 79, The Markdown row
builder (md.push_str) currently only escapes pipe characters in commit.message,
but newline characters still break table rows; update the formatting so
commit.message is first normalized to remove or replace line breaks (e.g.,
replace '\r' and '\n' with a single space or '<br>') and then escape '|' before
inserting into md.push_str; also apply the same newline normalization to
commit.concerns (or to the string produced by commit.concerns.join(", ")) so
neither field can introduce newlines that break the markdown table—modify the
expression around commit.message and commit.concerns where used in md.push_str
to perform these replacements.
The shell-based commit atomicity check has been migrated to a native Rust tool named
atomicity-checker.Key features of the new tool:
git2..github/atomicity-config.ymlwithglobsetfor precise file categorization.check,report,validate) as requested..github/workflows/commit-atomicity.ymlhas been added, and the build pipeline was updated to include the tool in the repository's pre-built binaries.This change ensures that all PRs follow the atomic commit principle with minimal CI overhead.
Fixes #692
PR created automatically by Jules for task 16410874462142133438 started by @iberi22
Summary by CodeRabbit
Release Notes
New Features
Chores