perf: skip unused prepared regex compiles#118
Conversation
|
Warning Review limit reached
Next review available in: 36 minutes Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable usage-based reviews in Billing to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information, and refer to the rate limits docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds ChangesRegexSet fast-path optimization and tests
Provenance artifacts update
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a meta_regex_can_parse helper to optimize RegexSet compilation by bypassing the full MetaRegex::new check for fast-path patterns. However, the reviewer identified a critical correctness issue where syntax validation alone does not guarantee successful compilation (e.g., due to size limits), potentially causing fallback path regressions. The reviewer suggested restricting this optimization to when loading from a prepared artifact.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if !needs_slow && meta_regex_can_parse(&core) { | ||
| fast_cores.push(dfa_core); | ||
| fast_info.push(PatternInfo { | ||
| original_index: usize_to_u32("Pattern index", i)?, | ||
| verifier, | ||
| boundaries: eb, | ||
| individual: None, | ||
| has_internal_b: false, | ||
| // Fast path patterns never query individual or fallback state. | ||
| fancy_fallback: None, | ||
| }); | ||
| } else if let Ok(individual) = MetaRegex::new(&core) { |
There was a problem hiding this comment.
Correctness Issue: Fallback Path Regression for Large/Complex Patterns
Using meta_regex_can_parse only verifies that the pattern is syntactically valid. It does not guarantee that MetaRegex::new (or MetaRegex::new_many) will successfully compile it, as compilation can still fail due to engine-specific constraints (such as the default 10MB size limit in regex-automata).
The Problem
Previously, if a pattern failed individual MetaRegex compilation (e.g., due to size limits), it would safely fall back to the fancy-regex path in the else block. With this change, such a pattern will now bypass the individual compilation check, get added to fast_cores, and subsequently cause the entire RegexSet compilation to fail during build_prepared_multi.
The Solution
We should only skip individual compilation when we are loading from a prepared artifact (PreparedMode::Load). In that mode, we already know the patterns successfully compiled when the artifact was originally captured, making it 100% safe to skip individual compilation. For unprepared compilation (PreparedMode::None or PreparedMode::Capture), we should still perform the individual compilation check to preserve the fallback behavior.
let is_load = matches!(prepared, PreparedMode::Load { .. });
let mut individual = None;
let can_fast = !needs_slow && (is_load && meta_regex_can_parse(&core) || (!is_load && {
if let Ok(ind) = MetaRegex::new(&core) {
individual = Some(ind);
true
} else {
false
}
}));
if can_fast {
fast_cores.push(dfa_core);
fast_info.push(PatternInfo {
original_index: usize_to_u32("Pattern index", i)?,
verifier,
boundaries: eb,
individual: None,
has_internal_b: false,
// Fast path patterns never query individual or fallback state.
fancy_fallback: None,
});
} else if let Ok(individual) = match individual {
Some(ind) => Ok(ind),
None => MetaRegex::new(&core),
} {There was a problem hiding this comment.
CC on behalf of @sok0
Agreed, implemented in 3572e34. Fresh and capture builds now still probe MetaRegex::new so unsupported patterns can fall back; only prepared loads use the parse-only fast shortcut. Added regression coverage for that mode split.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9c37fe32d2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if !needs_slow && meta_regex_can_parse(&core) { | ||
| fast_cores.push(dfa_core); |
There was a problem hiding this comment.
Preserve fallback for parseable build failures
When a fast-path pattern parses successfully but cannot be built by MetaRegex (for example, a very large alternation or other pattern that trips regex-automata build limits), this branch now pushes it into fast_cores and the later build_prepared_multi call returns an error. Before this change, the per-pattern MetaRegex::new(&core) failure sent that pattern through the fallback/chunking path below, so these parseable-but-too-large patterns could still be accepted; this change turns them into construction failures instead of falling back.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
CC on behalf of @sok0
Agreed, this is addressed in 3572e34. Fresh and capture builds still probe MetaRegex::new so parseable-but-too-large patterns keep the fallback path; only prepared loads use the parse-only shortcut. The branch also has property coverage comparing prepared and unprepared behavior for generated literal sets.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
provenance/sbom.cdx.json (1)
3157-3190: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winRestore the
wasip2license block.Line 3157 introduces
pkg:cargo/wasip2@1.0.4+wasi-0.2.12without alicensesentry, but Line 78 ofprovenance/THIRD-PARTY-NOTICES.txtand Line 14 ofprovenance/report.jsonboth treat it as a licensed dependency. That leaves the machine-readable SBOM incomplete and makes the reported licensed-dependency count overstated. Regenerate this component with the same SPDX expression carried by the notice file.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@provenance/sbom.cdx.json` around lines 3157 - 3190, The SBOM entry for wasip2 is missing its licenses metadata, so the component in provenance/sbom.cdx.json is incomplete. Update the wasip2 component block to restore the same SPDX license expression already associated with this dependency in the notice/report artifacts, and keep the existing identifiers like bom-ref, purl, and version unchanged while adding the licenses entry back to the component definition.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/core/src/lib.rs`:
- Around line 118-119: The load-mode skip in meta_regex_can_parse is only doing
a syntax parse, which can disagree with MetaRegex::new(&core) and cause patterns
to be routed into fast_cores differently than prepare. Update
meta_regex_can_parse (and the Load-mode branch that uses it) to use the same
buildability check as MetaRegex construction, or otherwise persist and reuse the
original routing decision from the prepared artifact stream so both paths stay
aligned.
In `@provenance/sbom.cdx.json`:
- Around line 547-552: The SBOM is serializing combined licenses as a raw
license.id string instead of a valid SPDX expression, causing inconsistency with
the rest of the file. Update the SBOM generation path that emits these entries
so the combined-license cases use the same license.expression field as other
multi-license records, or split them into separate single-license entries. Make
sure the generator handles the affected combined-license values in
provenance/sbom.cdx.json consistently before regenerating the document.
---
Outside diff comments:
In `@provenance/sbom.cdx.json`:
- Around line 3157-3190: The SBOM entry for wasip2 is missing its licenses
metadata, so the component in provenance/sbom.cdx.json is incomplete. Update the
wasip2 component block to restore the same SPDX license expression already
associated with this dependency in the notice/report artifacts, and keep the
existing identifiers like bom-ref, purl, and version unchanged while adding the
licenses entry back to the component definition.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: cd68aba3-fbc7-4e0c-8a61-213b000c600a
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
crates/core/Cargo.tomlcrates/core/src/lib.rsprovenance/THIRD-PARTY-NOTICES.txtprovenance/report.jsonprovenance/sbom.cdx.json
CC on behalf of @sok0
Summary
Checks
Summary by CodeRabbit
New Features
Bug Fixes
Chores