Skip to content

feat: tune prepared regex artifacts#77

Open
jan-kubica wants to merge 3 commits into
mainfrom
codex/lazy-regex-artifact-policy
Open

feat: tune prepared regex artifacts#77
jan-kubica wants to merge 3 commits into
mainfrom
codex/lazy-regex-artifact-policy

Conversation

@jan-kubica

@jan-kubica jan-kubica commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Adds policy controls for prepared regex artifacts and borrowed prepared artifact loading for Rust consumers.\n\nValidation run locally:\n- cargo fmt --all --check\n- cargo clippy --workspace --all-targets --all-features -- -D warnings\n- cargo test --workspace --locked\n- bun run typecheck\n- bun run lint\n- bun test\n- bun run build\n- bun run test:runtime:bun\n- bun run test:runtime:node\n- bun run version:check\n- bun audit

Summary by CodeRabbit

  • New Features

    • Added support for configuring whether prepared search artifacts are included or omitted.
    • Introduced new search-building options that can work with borrowed prepared artifacts for more efficient setup.
  • Bug Fixes

    • Improved search behavior when prepared regex artifacts are unavailable, ensuring searches still fall back correctly and continue matching as expected.
    • Added coverage for omission rules across both lazy and eager regex patterns.

@jan-kubica jan-kubica marked this pull request as ready for review June 30, 2026 07:09
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@jan-kubica, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 43 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1e0e4146-331e-46aa-849f-835089bf5210

📥 Commits

Reviewing files that changed from the base of the PR and between 654d9b1 and d4562c4.

📒 Files selected for processing (2)
  • crates/core/src/lib.rs
  • crates/core/tests/text_search.rs
📝 Walkthrough

Walkthrough

Adds RegexArtifactPolicy and PreparedArtifactPolicy enums to control whether prepared regex artifacts are captured or omitted, wires them into TextSearchOptions, RegexPattern, and RegexOptions. Introduces borrowed PreparedTextSearchArtifactsView types to avoid cloning bytes, refactors from_bytes and build/load paths to use views, and adds four view-based TextSearch constructors. Regression tests validate artifact omission under various policy configurations.

Changes

Artifact Capture Policies and View Layer

Layer / File(s) Summary
Policy enums and options model
crates/core/src/lib.rs
Adds RegexArtifactPolicy and PreparedArtifactPolicy enums with should_capture logic; wires regex_artifact_policy into TextSearchOptions (defaulting to Include); adds prepared_artifact_policy field to RegexPattern (defaulting to Inherit) and RegexOptions.
Borrowed artifact view structs and from_bytes refactor
crates/core/src/lib.rs
Defines PreparedTextSearchArtifactsView<'a>, PreparedAhoArtifactView<'a>, PreparedRegexArtifactView<'a> with &[u8] slices; updates from_bytes to parse into borrowed views; adds as_view() and into_owned() conversions; extends PreparedArtifactBytes trait to cover view types; updates AhoBuildMode::Load and RegexBuildMode::Load to accept view slices.
View-based TextSearch constructors
crates/core/src/lib.rs
Adds with_prepared_artifacts_view, with_prepared_artifacts_view_build_stats, with_prepared_all_literal_artifacts_view, and with_prepared_all_literal_artifacts_view_build_stats public constructors taking &PreparedTextSearchArtifactsView.
Pattern classification and build policy enforcement
crates/core/src/lib.rs
Updates classify_pattern_entries to carry prepared_artifact_policy into RegexOptions; forces isolation in partition_classified_patterns for non-Inherit policies; updates lazy and eager paths in build_regex_engine to compute capture_prepared and thread it into build_regex_set and capture_or_load_lazy_regex; adds should_capture_eager_regex_artifact helper; updates build_prefilter_regex call site.
Regression tests
crates/core/tests/text_search.rs
Imports PreparedArtifactPolicy and RegexArtifactPolicy; adds tests for lazy/eager regex omission by explicit policy, global omit overridden by per-pattern include, and correct match results in all cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.16% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely reflects the main change: adjusting prepared regex artifact behavior and policies.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/lazy-regex-artifact-policy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 654d9b13af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/core/src/lib.rs Outdated

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces RegexArtifactPolicy and PreparedArtifactPolicy to allow omitting prepared regex artifacts, along with zero-copy view types (such as PreparedTextSearchArtifactsView) for loading prepared artifacts. Feedback is provided regarding duplicated fallback logic for building a RegexSet from source, which could be extracted into a local closure to improve maintainability.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread crates/core/src/lib.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/core/src/lib.rs (2)

2351-2364: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Apply the omit policy to prefilter_regex artifacts too.

build_prefilter_regex hardcodes capture_prepared = true, so a lazy regex with prepared_artifact_policy = Omit or global regex_artifact_policy = Omit can still serialize prepared regex bytes for its secondary prefilter. Compute capture_prepared before building the prefilter and pass it through.

Proposed fix
-      let prefilter_regex = lazy_options
+      let capture_prepared = lazy_options
+        .prepared_artifact_policy
+        .should_capture(options.regex_artifact_policy);
+      let prefilter_regex = lazy_options
         .prefilter_regex
-        .map(|source| build_prefilter_regex(source, regex_mode))
+        .map(|source| {
+          build_prefilter_regex(source, regex_mode, capture_prepared)
+        })
         .transpose()?
         .map(Box::new);
-      let capture_prepared = lazy_options
-        .prepared_artifact_policy
-        .should_capture(options.regex_artifact_policy);
       let prepared = capture_or_load_lazy_regex(
 fn build_prefilter_regex(
   source: String,
   regex_mode: &mut RegexBuildMode<'_>,
+  capture_prepared: bool,
 ) -> Result<regex_core::RegexSet> {
   build_regex_set(
     vec![source],
@@
     },
     regex_mode,
-    true,
+    capture_prepared,
   )
 }

Also applies to: 2825-2836

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/core/src/lib.rs` around lines 2351 - 2364, The lazy regex path is
still serializing prefilter artifacts even when the prepared artifact policy is
Omit. Update the prefilter flow in the regex-loading logic around
build_prefilter_regex, capture_or_load_lazy_regex, and the
lazy_options.prepared_artifact_policy.should_capture decision so
capture_prepared is computed once up front and passed through to prefilter
construction. Make sure the same capture_prepared value is used for both the
primary prepared regex and the secondary prefilter regex, including the
equivalent code path that also applies to the other listed location.

224-238: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Keep these public Rust structs backward compatible. Adding fields to RegexPattern and TextSearchOptions breaks direct struct literals for Rust consumers; preserve a compatibility path or treat this as an intentional major-version bump.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/core/src/lib.rs` around lines 224 - 238, Keep RegexPattern and
TextSearchOptions backward compatible for Rust consumers by preserving support
for existing direct struct literals; either avoid exposing these new fields as
required public additions, provide defaults/builders/compatibility constructors,
or gate the change behind an intentional major-version bump. Update the public
API in crates/core/src/lib.rs so existing code using RegexPattern and
TextSearchOptions can still compile without specifying the newly added fields.
🧹 Nitpick comments (1)
crates/core/tests/text_search.rs (1)

606-660: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add an eager Include-overrides-global-Omit regression test.

The suite covers the override case for lazy regexes, but not for eager regexes. Since this PR adds separate eager capture logic, an eager-path regression where PreparedArtifactPolicy::Include is ignored would still pass these tests.

Proposed test
+#[test]
+fn prepared_eager_regex_artifacts_can_override_global_omit() {
+  let mut regex = RegexPattern::new(r"\bGamma-\d{4}\b");
+  regex.prepared_artifact_policy = PreparedArtifactPolicy::Include;
+  let patterns = vec![PatternEntry::Regex(regex)];
+  let options = TextSearchOptions {
+    regex_artifact_policy: RegexArtifactPolicy::Omit,
+    ..TextSearchOptions::default()
+  };
+
+  let artifacts =
+    TextSearch::prepare_artifacts(patterns.clone(), options).unwrap();
+  assert_eq!(artifacts.regex_sets.len(), 1);
+  assert!(
+    artifacts
+      .regex_sets
+      .first()
+      .is_some_and(|artifact| !artifact.bytes.is_empty())
+  );
+
+  let direct = TextSearch::new(patterns.clone(), options).unwrap();
+  let prepared =
+    TextSearch::with_prepared_artifacts(patterns, options, &artifacts).unwrap();
+  assert_eq!(
+    prepared.find_iter("Gamma-1234").unwrap(),
+    direct.find_iter("Gamma-1234").unwrap()
+  );
+}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/core/tests/text_search.rs` around lines 606 - 660, Add a regression
test in the text_search tests that covers eager regex artifacts honoring
PreparedArtifactPolicy::Include even when
TextSearchOptions::regex_artifact_policy is Omit; extend the existing
prepared_artifacts/with_prepared_artifacts coverage around
TextSearch::prepare_artifacts, TextSearch::new, and
TextSearch::with_prepared_artifacts to assert the eager path still produces
non-empty artifact bytes and matches direct search results, specifically for
RegexPattern-based eager regexes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/core/src/lib.rs`:
- Around line 2351-2364: The lazy regex path is still serializing prefilter
artifacts even when the prepared artifact policy is Omit. Update the prefilter
flow in the regex-loading logic around build_prefilter_regex,
capture_or_load_lazy_regex, and the
lazy_options.prepared_artifact_policy.should_capture decision so
capture_prepared is computed once up front and passed through to prefilter
construction. Make sure the same capture_prepared value is used for both the
primary prepared regex and the secondary prefilter regex, including the
equivalent code path that also applies to the other listed location.
- Around line 224-238: Keep RegexPattern and TextSearchOptions backward
compatible for Rust consumers by preserving support for existing direct struct
literals; either avoid exposing these new fields as required public additions,
provide defaults/builders/compatibility constructors, or gate the change behind
an intentional major-version bump. Update the public API in
crates/core/src/lib.rs so existing code using RegexPattern and TextSearchOptions
can still compile without specifying the newly added fields.

---

Nitpick comments:
In `@crates/core/tests/text_search.rs`:
- Around line 606-660: Add a regression test in the text_search tests that
covers eager regex artifacts honoring PreparedArtifactPolicy::Include even when
TextSearchOptions::regex_artifact_policy is Omit; extend the existing
prepared_artifacts/with_prepared_artifacts coverage around
TextSearch::prepare_artifacts, TextSearch::new, and
TextSearch::with_prepared_artifacts to assert the eager path still produces
non-empty artifact bytes and matches direct search results, specifically for
RegexPattern-based eager regexes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4ca2017e-615b-4603-abc5-d05c285c4c3e

📥 Commits

Reviewing files that changed from the base of the PR and between 047f519 and 654d9b1.

📒 Files selected for processing (2)
  • crates/core/src/lib.rs
  • crates/core/tests/text_search.rs

@jan-kubica

Copy link
Copy Markdown
Contributor Author

CC on behalf of @jan-kubica

Addressed the actionable review items in d4562c4:

  • applied the resolved regex artifact capture policy to lazy prefilter_regex artifacts and added regression coverage for omitted prefilter artifacts
  • extracted source RegexSet fallback construction into a shared helper
  • added eager PreparedArtifactPolicy::Include over global RegexArtifactPolicy::Omit coverage

The public struct-literal compatibility note is acknowledged. This Rust core API is still being shaped before stable release; callers should prefer constructors and default-update options while these policies settle.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d4562c43a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/core/src/lib.rs
Comment on lines +1618 to +1620
|| regex_options.prepared_artifact_policy
!= PreparedArtifactPolicy::Inherit
}) || pattern.alternation_count > options.max_alternations

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep artifact policy from changing regex match order

Setting prepared_artifact_policy on an otherwise eager regex now sends it to isolated_regex. In mixed lists, shared regex engines are built before isolated ones, while merge_and_select tie-breaks equal start/equal length matches by incoming engine order; a cache-only choice like [Regex("abc", Omit), Regex("abc")] can therefore make find_iter/replace_all prefer the second pattern instead of the lower original index. Preserve original-pattern tie breaking or avoid reordering when applying the artifact policy.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant