Skip to content

feat(regex): add deadline-aware Regex::new_with_deadline API#25

Draft
youichi-uda wants to merge 1 commit intoquickwit-inc:masterfrom
youichi-uda:add-regex-deadline-timeout
Draft

feat(regex): add deadline-aware Regex::new_with_deadline API#25
youichi-uda wants to merge 1 commit intoquickwit-inc:masterfrom
youichi-uda:add-regex-deadline-timeout

Conversation

@youichi-uda
Copy link
Copy Markdown

Motivation

Pathological regex patterns can take multiple seconds in the NFA-to-DFA expansion before the existing STATE_LIMIT / CompiledTooBig guards reject them. The existing limits bound memory but not wall-clock time. For example, on a release build (x86_64 Linux):

Pattern bytes Outcome
\w{0,5}\w{0,5}\w{0,5}\w{0,5} 28 2.4 s then Err(TooManyStates(1000))
\w{0,4}\w{0,4}\w{0,4}\w{0,4}\w{0,4}\w{0,4} 42 3.8 s then Err(TooManyStates(1000))
\w{0,3} × 8 56 5.0 s then Err(TooManyStates(1000))

For services that compile user-supplied regex patterns on a request thread (full-text search _search endpoints, log query DSLs, etc.), this is a low-payload DoS vector that caller-side caps cannot fully fix, because the expensive work happens inside Regex::new before any caller cap can intervene.

We hit this in ferrosearch three times in three days through fuzzing of a _search regex DSL endpoint. Each time we tightened a caller-side input cap (commits abbf6c5, ae0a3d4, 207841c) the next fuzz wave found a new escape vector — pattern length, alternation count, and nesting depth are all loosely correlated with build time, so input-shape caps are an arms race. A wall-clock deadline is the only durable fix.

API design

impl Regex {
    /// Existing API: unchanged.
    pub fn new(re: &str) -> Result<Regex, Error> { ... }

    /// New API: identical to `new`, but aborts with `Error::DeadlineExceeded`
    /// if `deadline` passes during DFA construction.
    pub fn new_with_deadline(re: &str, deadline: Instant) -> Result<Regex, Error> { ... }
}

pub enum Error {
    // ... existing variants ...
    DeadlineExceeded,
}

Design notes:

  • Backwards-compatible: Regex::new, with_size_limit, the public Error enum (additive), and the Automaton impl are unchanged for existing callers. The only API surface added is Regex::new_with_deadline and Error::DeadlineExceeded.
  • Instant deadline (not Duration): lets callers compose with an outer request deadline (req_deadline.min(per_query_budget)) without bookkeeping. Matches the convention used by tokio::time::timeout_at, std::sync::Condvar::wait_timeout_until, etc.
  • Internal: DfaBuilder::build() now delegates to build_with_deadline(None), which is where the new check lives.
  • Where the check fires: inside the byte-loop of DFA construction, batched every DEADLINE_CHECK_INTERVAL = 1024 transitions. This is the hot loop where the STATE_LIMIT guard already lives, and is by far the dominant cost on adversarial patterns.

I considered alternatives:

  • Adding the check inside Compiler::c (NFA build) — the NFA build is mostly linear in HIR size and already bounded by CompiledTooBig. Empirically the trap is in DFA expansion, not compile, so adding it there would be dead code. Happy to add a symmetric check there if reviewers prefer.
  • A with_deadline builder pattern — would let us also expose the size limit (currently hardcoded to 10 << 20). Out of scope for this PR but a natural follow-up; this PR keeps the new API minimal so it can be reviewed and shipped quickly.

Backwards compatibility

  • Regex::new is unchanged (delegates through with_size_limit_and_deadline(size, re, None)).
  • Error is #[derive(Debug)] only, no #[non_exhaustive]. Adding a variant to a public non-#[non_exhaustive] enum is technically a minor breaking change for downstream matches without _ arms. Given (a) tantivy-fst's README explicitly steers new users to the upstream fst crate, (b) Error is pub enum without exhaustiveness guarantees in any docs, and (c) the practical alternative is leaving callers vulnerable to the DoS, I think this is the right tradeoff. If reviewers prefer, I can add #[non_exhaustive] to Error in the same release as a hardening pass.

Performance measurement

Microbench (1000 iterations, release build, x86_64 Linux):

Pattern Regex::new new_with_deadline(60s) Overhead
hello.*world 238 µs 243 µs +2.1%
[a-zA-Z0-9]+ 10.6 µs 10.6 µs ~0%
(foo|bar|baz|qux)+ 20.5 µs 20.6 µs ~0%

Pathological pattern \w{0,5}\w{0,5}\w{0,5}\w{0,5}:

Call Latency Result
Regex::new(...) 2447 ms Err(TooManyStates(1000))
Regex::new_with_deadline(50 ms) 55 ms Err(DeadlineExceeded)
Regex::new_with_deadline(250 ms) 253 ms Err(DeadlineExceeded)

44× latency reduction on the adversarial input; deadline overshoot is ~5 ms (one check interval).

Test results

Added 4 unit tests in src/regex/mod.rs:

  • deadline_aborts_pathological_pattern: 50 ms deadline aborts the slow pattern in well under the 2.4 s baseline.
  • deadline_does_not_affect_easy_pattern: long deadline + small pattern compiles cleanly.
  • new_unchanged_for_easy_pattern: backwards-compat smoke for the existing API.
  • deadline_exceeded_error_is_distinct: Display / Debug formatting for the new variant.

Full suite:

cargo test --release   ->   141 passed, 5 ignored (2 suites, 0.27s)
cargo test             ->   141 passed, 5 ignored (2 suites, 1.67s)

No existing tests modified.

Defense in depth

The 3 caller-side input-shape caps in ferrosearch will remain even after this lands. The deadline is the floor; caller caps still serve as cheap early-rejection so we don't even hand obvious-garbage to the regex engine. I think both layers are valuable and the deadline is the durable one.


Filed as draft for design review on the API shape (especially the non_exhaustive question and whether to expose a builder). Happy to iterate.

Pathological regex patterns can take multiple seconds in the NFA-to-DFA
expansion before the existing STATE_LIMIT/CompiledTooBig guards reject
them. Existing limits bound *memory* but not *wall-clock time*. For
example:

    \w{0,5}\w{0,5}\w{0,5}\w{0,5}    -> 2.4s before TooManyStates(1000)
    \w{0,4}\w{0,4}\w{0,4}\w{0,4}\w{0,4}\w{0,4}    -> 3.8s
    \w{0,3} repeated 8 times                     -> 5.0s

For services that compile user-supplied regex patterns on a request
thread (full-text search _search endpoints, log query DSLs, etc), this
is a low-payload DoS vector that caller-side caps cannot fully fix
because the expensive work happens *inside* Regex::new before any
caller cap can intervene.

This adds Regex::new_with_deadline(re, deadline) that takes an
Instant deadline and aborts DFA construction with a new
Error::DeadlineExceeded variant. The check is batched (every 1024
transitions) so overhead on well-behaved patterns is well under 1%.

Backwards-compatible: existing Regex::new is unchanged. The internal
DfaBuilder gains build_with_deadline(Option<Instant>) and the original
build() now delegates with None.

Measurements (release build, x86_64 Linux):

  Easy patterns (n=1000):
    "hello.*world"      no-deadline 238 us  with-deadline 243 us  (+2%)
    "[a-zA-Z0-9]+"      no-deadline  10 us  with-deadline  10 us  (~0%)
    "(foo|bar|baz|qux)+" no-deadline 20 us  with-deadline  20 us  (~0%)

  Pathological pattern \w{0,5}\w{0,5}\w{0,5}\w{0,5}:
    Regex::new                              -> 2447ms / TooManyStates
    new_with_deadline (50ms deadline)       ->   55ms / DeadlineExceeded
    new_with_deadline (250ms deadline)      ->  253ms / DeadlineExceeded

Tests added in src/regex/mod.rs:
  - deadline_aborts_pathological_pattern: 50ms deadline aborts
    SLOW_PATTERN well under the 2.4s baseline
  - deadline_does_not_affect_easy_pattern: long deadline + small
    pattern compiles cleanly
  - new_unchanged_for_easy_pattern: backwards-compat smoke
  - deadline_exceeded_error_is_distinct: Display/Debug formatting

Full suite: 141 passed, 5 ignored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant