Skip to content

fix(parallel): drop MinGW emutls thread_local on the worker NA path#253

Merged
ms609 merged 1 commit into
cpp-searchfrom
claude/fix-na-emutls-threadlocal
Jun 20, 2026
Merged

fix(parallel): drop MinGW emutls thread_local on the worker NA path#253
ms609 merged 1 commit into
cpp-searchfrom
claude/fix-na-emutls-threadlocal

Conversation

@ms609

@ms609 ms609 commented Jun 20, 2026

Copy link
Copy Markdown
Owner

Problem

Parallel NA search (nThreads >= 2) intermittently aborts with STATUS_HEAP_CORRUPTION on Windows/MinGW. Serial is unaffected; EW is unaffected.

Root cause

The per-thread scratch in the TBR kernel (ts_tbr.cpp, ts_fitch.cpp) and exact_verify_sweep's optimum cache were function-local static thread_local. On MinGW these resolve via emutls, whose thread_local teardown across std::thread spawn/exit corrupts the heap. EW never trips it (light TLS); the NA path does, because exact_verify_sweep adds a thread_local std::unordered_set plus more scratch.

Fix

  • Convert all worker-reachable scratch from static thread_local to plain function-locals — each worker owns its call frame, so they're per-thread-safe with no emutls. Per-clip reallocation measured ≤1.6% on 88-tip data, ~0% typical.
  • Move exact_verify_sweep's optimum memoization (evs_false_cache / evs_last_fp) to mutable members on DataSet. Each worker's ds_local has the same per-worker, cross-replicate lifetime the thread_local had, so persistence is unchanged — without emutls.

Verification (clean builds: rm src/*.o; CCACHE_DISABLE=1; --preclean, on this base)

parallel NA parallel EW serial scores NA perf
base (da0f203) crashes (iter ~4–8) 200/200 NA=79, EW=78
this PR 120/120 200/200 NA=79, EW=78 (bit-identical) 4.15s (cache intact, vs 5.81s cache-disabled)

🤖 Generated with Claude Code

The parallel NA search (nThreads>=2) intermittently aborted with
STATUS_HEAP_CORRUPTION.  Root cause: the per-thread scratch in the TBR
kernel (ts_tbr.cpp, ts_fitch.cpp) and exact_verify_sweep's optimum cache
were function-local `static thread_local`.  On MinGW these resolve via
emutls, whose thread_local teardown across std::thread spawn/exit corrupts
the heap.  EW is unaffected (light TLS); the NA path trips it because
exact_verify adds a thread_local unordered_set plus more scratch.

Fix: convert all worker-reachable scratch to plain function-locals (each
worker owns its call frame -> per-thread-safe; per-clip realloc measured
<=1.6% on 88-tip data, ~0% typical).  Move exact_verify_sweep's optimum
memoization to mutable members on DataSet so it keeps the same per-worker,
cross-replicate persistence the thread_local had, without emutls.

Verified on clean builds (rm src/*.o; CCACHE_DISABLE=1; --preclean):
parallel NA survives 120/120 (was iter ~4-8), EW 200/200, serial scores
bit-identical, NA perf 4.15s (cache intact, vs 5.81s cache-disabled).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ms609 ms609 merged commit d6fa512 into cpp-search Jun 20, 2026
1 of 10 checks passed
@ms609 ms609 deleted the claude/fix-na-emutls-threadlocal branch June 20, 2026 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant