Skip to content

test(mutants): bounded catches for repo-wide survivors + run the lane under nextest#151

Merged
heyoub merged 2 commits into
mainfrom
feat/repo-wide-ratchet-bounded-catches
Jun 30, 2026
Merged

test(mutants): bounded catches for repo-wide survivors + run the lane under nextest#151
heyoub merged 2 commits into
mainfrom
feat/repo-wide-ratchet-bounded-catches

Conversation

@heyoub

@heyoub heyoub commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

What & why

The repo-wide ratchet smoke shard 0 went red on its first complete cloud run (now that Blacksmith is fast enough to finish the 85-min lane). It scored 94% — well above the 75% floor — but hard-failed on 5 TIMEOUT mutants: genuine livelock mutants in the writer / visibility / frontier paths.

Root cause (the deeper issue)

The bounded tests alone would not have fixed this. The mutation lane was the only test lane in the house still on raw cargo test. Under cargo test, one test hanging on a mutation never lets the shared lib-unittests binary exit, so it masks every fast-failing assertion in that binary, and a killable mutant reads as a TIMEOUT survivor (our policy correctly treats timed_out > 0 as a failure, not as caught). lanes.rs even carried a stale comment betting "cargo-mutants treats a timeout as caught" — false under that policy.

The fix (two parts)

1. Run mutants under nextest. --test-tool nextest + the ci profile's terminate-after (pinned via NEXTEST_PROFILE=ci in the runner). Per-test process isolation means a livelock is reaped as a bounded per-test timeout and the fast assertion convicts the mutant first. This aligns the mutation lane with every other test lane (run_nextest_ci) and fixes hang-masking for the other 47 shards too.

  • .cargo/mutants.toml, plan.rs, mod.rs fixtures: --test-tool nextest
  • run.rs: NEXTEST_PROFILE=ci (its slow-timeout overrides keep the unmutated baseline from tripping terminate-after)
  • lanes.rs: corrected the stale timeout-vs-caught note

2. Seven bounded assertion catches — each survivor flips TIMEOUT/MISSED → CAUGHT by a sub-millisecond assertion instead of a 203s hang:

seam class catch
writer_queue_len Some(0) TIMEOUT direct-read, non-gated so the --no-default-features lane (where the existing catcher is dangerous-test-hooks-gated) kills it too
SequenceGate::publish_on_lanes >>= TIMEOUT publish-at-frontier returns Ok + advances visible
unfenced single-append global_seq + 1* 1 TIMEOUT global visible_sequence() == 1 after first event
recreate_restart_segmentNone TIMEOUT restart precondition yields Some
MonotonicClock::process_boot_ns MISSED delegates the real anchor, not a stub
sim next_seq += 1*= 1 MISSED op-trace digest golden pin
cursor stop_and_joinOk(()) MISSED surfaces the worker's startup error

prepared-batch-items (empty-slice) needs no new test: the existing prepared_batch_dedupes_entity_and_scope_strings already asserts on items(); it was only masked by a sibling hang, which nextest now unmasks.

Verification (local, before push)

  • All 7 new tests pass on real code: cargo test -p batpak --all-features (incl. the sim op-trace golden constant).
  • The writer-queue-len test compiles under --no-default-features.
  • All 34 xtask mutants fixture/policy tests stay green (nextest switch is internally consistent).
  • Pre-commit gates clean: traceability-check: ok, structural-check: ok (overclaim, triangulation, capability-snapshot, …).

What proves it in CI

The four required gates (ci-fast, meta-gate, gauntlet, Windows) gate the merge. The thing that actually proves the 5 TIMEOUTs are gone is the non-required Mutation smoke (repo-wide ratchet) lane — watch that one go green (and faster, with the 5×203s timeouts eliminated).

🤖 Generated with Claude Code

… under nextest

Repo-wide ratchet shard 0 failed on 5 TIMEOUT mutants while scoring 94% (above
the 75% floor): genuine livelock mutants in the writer / visibility / frontier
paths. Root cause is that the mutation lane was the ONLY test lane in the house
still on raw `cargo test`, where one hung test never lets the shared test binary
exit, so it masks every fast-failing assertion and killable mutants read as
TIMEOUT survivors (our policy correctly treats a timeout as a failure, not as
caught).

Two-part fix.

1. Run mutants under `--test-tool nextest` with the `ci` profile (per-test
   process isolation + terminate-after), aligning the lane with every other
   test lane (run_nextest_ci). A mutation-induced livelock is now reaped as a
   bounded per-test timeout and the fast assertion convicts the mutant first.
   - .cargo/mutants.toml: test_tool = nextest
   - plan.rs / mod.rs fixtures: --test-tool nextest
   - run.rs: NEXTEST_PROFILE=ci (its slow-timeout overrides keep the unmutated
     baseline from tripping terminate-after)
   - lanes.rs: correct the stale "cargo-mutants treats a timeout as caught" note

2. Bounded assertion catches converting each survivor TIMEOUT/MISSED -> CAUGHT
   by a sub-millisecond assertion instead of a 203s hang:
   - writer_queue_len Some(0): a NON-gated direct-read test so the
     --no-default-features lane (where the dangerous-test-hooks catcher is not
     compiled) still kills it
   - SequenceGate publish-at-frontier (`>` vs `>=`)
   - unfenced single-append global visibility frontier (`global_seq + 1` vs `* 1`)
   - recreate_restart_segment -> None
   - MonotonicClock::process_boot_ns delegation (MISSED)
   - sim workload next_seq `+=` vs `*=` op-trace digest pin (MISSED)
   - cursor stop_and_join error propagation vs Ok(()) (MISSED)

prepared-batch-items (empty-slice) needs no new test: the existing
prepared_batch_dedupes_entity_and_scope_strings already asserts on items() and
was only masked by a sibling hang, which nextest now unmasks.

Verified locally: all 7 new tests pass on real code (cargo test -p batpak
--all-features), the writer-queue-len test compiles under --no-default-features,
and the 34 xtask mutants fixture/policy tests stay green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
@blacksmith-sh

This comment has been minimized.

…of timing out

The first nextest run of the cured lane still hit 7 timeouts (5 -> 7) and took
2h. Root cause: I pinned NEXTEST_PROFILE=ci, whose `fail-fast = false` is the
exact anti-pattern cargo-mutants warns against for mutation. With fail-fast off,
nextest runs the WHOLE suite per mutant, so a sibling test the mutation
livelocked keeps the run alive until cargo-mutants' outer timeout — re-creating
the cargo-test masking one level up, and the fast assertion that already caught
the mutant is never seen.

Local proof: applying `append.rs global_seq + 1 -> * 1` by hand and running
`cargo nextest run` with `fail-fast = true` convicts the mutant in 0.158s of
test time (exit non-zero after 48/1585 tests, cancelling the rest) — it never
even reaches the livelock test. cargo-mutants' docs confirm it honors the
profile's fail-fast and recommend keeping it on for mutation.

Fix: a dedicated `[profile.mutants]` in .config/nextest.toml —
  - fail-fast = true (the lever: convict on the first failing test)
  - slow-timeout terminate-after (backstop for a pure-hang mutant no assertion
    catches: the hung test is reaped as a per-test timeout-failure)
  - the known-slow-surface overrides mirrored from the ci profile so the
    unmutated baseline (which runs every test, fail-fast never triggering) cannot
    trip terminate-after
pinned via NEXTEST_PROFILE=mutants in the mutants runner.

This converts the TIMEOUT survivors to fast assertion catches AND fixes the 2h
wall-clock (caught mutants exit at first failure, in ms). The MISSED cures from
the prior commit already took (94% -> 100%, 0 missed); this closes the TIMEOUT
side.

Validated locally: the `mutants` profile parses and runs
(`cargo nextest run --profile mutants` => "nextest profile: mutants", test PASS).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
@heyoub heyoub merged commit 0f40785 into main Jun 30, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant