feat: tune prepared regex artifacts#77
Conversation
|
Warning Review limit reached
Next review available in: 43 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds ChangesArtifact Capture Policies and View Layer
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 654d9b13af
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request introduces RegexArtifactPolicy and PreparedArtifactPolicy to allow omitting prepared regex artifacts, along with zero-copy view types (such as PreparedTextSearchArtifactsView) for loading prepared artifacts. Feedback is provided regarding duplicated fallback logic for building a RegexSet from source, which could be extracted into a local closure to improve maintainability.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/core/src/lib.rs (2)
2351-2364: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winApply the omit policy to
prefilter_regexartifacts too.
build_prefilter_regexhardcodescapture_prepared = true, so a lazy regex withprepared_artifact_policy = Omitor globalregex_artifact_policy = Omitcan still serialize prepared regex bytes for its secondary prefilter. Computecapture_preparedbefore building the prefilter and pass it through.Proposed fix
- let prefilter_regex = lazy_options + let capture_prepared = lazy_options + .prepared_artifact_policy + .should_capture(options.regex_artifact_policy); + let prefilter_regex = lazy_options .prefilter_regex - .map(|source| build_prefilter_regex(source, regex_mode)) + .map(|source| { + build_prefilter_regex(source, regex_mode, capture_prepared) + }) .transpose()? .map(Box::new); - let capture_prepared = lazy_options - .prepared_artifact_policy - .should_capture(options.regex_artifact_policy); let prepared = capture_or_load_lazy_regex(fn build_prefilter_regex( source: String, regex_mode: &mut RegexBuildMode<'_>, + capture_prepared: bool, ) -> Result<regex_core::RegexSet> { build_regex_set( vec![source], @@ }, regex_mode, - true, + capture_prepared, ) }Also applies to: 2825-2836
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/core/src/lib.rs` around lines 2351 - 2364, The lazy regex path is still serializing prefilter artifacts even when the prepared artifact policy is Omit. Update the prefilter flow in the regex-loading logic around build_prefilter_regex, capture_or_load_lazy_regex, and the lazy_options.prepared_artifact_policy.should_capture decision so capture_prepared is computed once up front and passed through to prefilter construction. Make sure the same capture_prepared value is used for both the primary prepared regex and the secondary prefilter regex, including the equivalent code path that also applies to the other listed location.
224-238: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winKeep these public Rust structs backward compatible. Adding fields to
RegexPatternandTextSearchOptionsbreaks direct struct literals for Rust consumers; preserve a compatibility path or treat this as an intentional major-version bump.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/core/src/lib.rs` around lines 224 - 238, Keep RegexPattern and TextSearchOptions backward compatible for Rust consumers by preserving support for existing direct struct literals; either avoid exposing these new fields as required public additions, provide defaults/builders/compatibility constructors, or gate the change behind an intentional major-version bump. Update the public API in crates/core/src/lib.rs so existing code using RegexPattern and TextSearchOptions can still compile without specifying the newly added fields.
🧹 Nitpick comments (1)
crates/core/tests/text_search.rs (1)
606-660: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd an eager
Include-overrides-global-Omitregression test.The suite covers the override case for lazy regexes, but not for eager regexes. Since this PR adds separate eager capture logic, an eager-path regression where
PreparedArtifactPolicy::Includeis ignored would still pass these tests.Proposed test
+#[test] +fn prepared_eager_regex_artifacts_can_override_global_omit() { + let mut regex = RegexPattern::new(r"\bGamma-\d{4}\b"); + regex.prepared_artifact_policy = PreparedArtifactPolicy::Include; + let patterns = vec![PatternEntry::Regex(regex)]; + let options = TextSearchOptions { + regex_artifact_policy: RegexArtifactPolicy::Omit, + ..TextSearchOptions::default() + }; + + let artifacts = + TextSearch::prepare_artifacts(patterns.clone(), options).unwrap(); + assert_eq!(artifacts.regex_sets.len(), 1); + assert!( + artifacts + .regex_sets + .first() + .is_some_and(|artifact| !artifact.bytes.is_empty()) + ); + + let direct = TextSearch::new(patterns.clone(), options).unwrap(); + let prepared = + TextSearch::with_prepared_artifacts(patterns, options, &artifacts).unwrap(); + assert_eq!( + prepared.find_iter("Gamma-1234").unwrap(), + direct.find_iter("Gamma-1234").unwrap() + ); +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/core/tests/text_search.rs` around lines 606 - 660, Add a regression test in the text_search tests that covers eager regex artifacts honoring PreparedArtifactPolicy::Include even when TextSearchOptions::regex_artifact_policy is Omit; extend the existing prepared_artifacts/with_prepared_artifacts coverage around TextSearch::prepare_artifacts, TextSearch::new, and TextSearch::with_prepared_artifacts to assert the eager path still produces non-empty artifact bytes and matches direct search results, specifically for RegexPattern-based eager regexes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/core/src/lib.rs`:
- Around line 2351-2364: The lazy regex path is still serializing prefilter
artifacts even when the prepared artifact policy is Omit. Update the prefilter
flow in the regex-loading logic around build_prefilter_regex,
capture_or_load_lazy_regex, and the
lazy_options.prepared_artifact_policy.should_capture decision so
capture_prepared is computed once up front and passed through to prefilter
construction. Make sure the same capture_prepared value is used for both the
primary prepared regex and the secondary prefilter regex, including the
equivalent code path that also applies to the other listed location.
- Around line 224-238: Keep RegexPattern and TextSearchOptions backward
compatible for Rust consumers by preserving support for existing direct struct
literals; either avoid exposing these new fields as required public additions,
provide defaults/builders/compatibility constructors, or gate the change behind
an intentional major-version bump. Update the public API in
crates/core/src/lib.rs so existing code using RegexPattern and TextSearchOptions
can still compile without specifying the newly added fields.
---
Nitpick comments:
In `@crates/core/tests/text_search.rs`:
- Around line 606-660: Add a regression test in the text_search tests that
covers eager regex artifacts honoring PreparedArtifactPolicy::Include even when
TextSearchOptions::regex_artifact_policy is Omit; extend the existing
prepared_artifacts/with_prepared_artifacts coverage around
TextSearch::prepare_artifacts, TextSearch::new, and
TextSearch::with_prepared_artifacts to assert the eager path still produces
non-empty artifact bytes and matches direct search results, specifically for
RegexPattern-based eager regexes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 4ca2017e-615b-4603-abc5-d05c285c4c3e
📒 Files selected for processing (2)
crates/core/src/lib.rscrates/core/tests/text_search.rs
|
CC on behalf of @jan-kubica Addressed the actionable review items in d4562c4:
The public struct-literal compatibility note is acknowledged. This Rust core API is still being shaped before stable release; callers should prefer constructors and default-update options while these policies settle. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d4562c43a6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| || regex_options.prepared_artifact_policy | ||
| != PreparedArtifactPolicy::Inherit | ||
| }) || pattern.alternation_count > options.max_alternations |
There was a problem hiding this comment.
Keep artifact policy from changing regex match order
Setting prepared_artifact_policy on an otherwise eager regex now sends it to isolated_regex. In mixed lists, shared regex engines are built before isolated ones, while merge_and_select tie-breaks equal start/equal length matches by incoming engine order; a cache-only choice like [Regex("abc", Omit), Regex("abc")] can therefore make find_iter/replace_all prefer the second pattern instead of the lower original index. Preserve original-pattern tie breaking or avoid reordering when applying the artifact policy.
Useful? React with 👍 / 👎.
Adds policy controls for prepared regex artifacts and borrowed prepared artifact loading for Rust consumers.\n\nValidation run locally:\n- cargo fmt --all --check\n- cargo clippy --workspace --all-targets --all-features -- -D warnings\n- cargo test --workspace --locked\n- bun run typecheck\n- bun run lint\n- bun test\n- bun run build\n- bun run test:runtime:bun\n- bun run test:runtime:node\n- bun run version:check\n- bun audit
Summary by CodeRabbit
New Features
Bug Fixes