fix(parallel): drop MinGW emutls thread_local on the worker NA path#253
Merged
Conversation
The parallel NA search (nThreads>=2) intermittently aborted with STATUS_HEAP_CORRUPTION. Root cause: the per-thread scratch in the TBR kernel (ts_tbr.cpp, ts_fitch.cpp) and exact_verify_sweep's optimum cache were function-local `static thread_local`. On MinGW these resolve via emutls, whose thread_local teardown across std::thread spawn/exit corrupts the heap. EW is unaffected (light TLS); the NA path trips it because exact_verify adds a thread_local unordered_set plus more scratch. Fix: convert all worker-reachable scratch to plain function-locals (each worker owns its call frame -> per-thread-safe; per-clip realloc measured <=1.6% on 88-tip data, ~0% typical). Move exact_verify_sweep's optimum memoization to mutable members on DataSet so it keeps the same per-worker, cross-replicate persistence the thread_local had, without emutls. Verified on clean builds (rm src/*.o; CCACHE_DISABLE=1; --preclean): parallel NA survives 120/120 (was iter ~4-8), EW 200/200, serial scores bit-identical, NA perf 4.15s (cache intact, vs 5.81s cache-disabled). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Parallel NA search (
nThreads >= 2) intermittently aborts withSTATUS_HEAP_CORRUPTIONon Windows/MinGW. Serial is unaffected; EW is unaffected.Root cause
The per-thread scratch in the TBR kernel (
ts_tbr.cpp,ts_fitch.cpp) andexact_verify_sweep's optimum cache were function-localstatic thread_local. On MinGW these resolve via emutls, whosethread_localteardown acrossstd::threadspawn/exit corrupts the heap. EW never trips it (light TLS); the NA path does, becauseexact_verify_sweepadds athread_local std::unordered_setplus more scratch.Fix
static thread_localto plain function-locals — each worker owns its call frame, so they're per-thread-safe with no emutls. Per-clip reallocation measured ≤1.6% on 88-tip data, ~0% typical.exact_verify_sweep's optimum memoization (evs_false_cache/evs_last_fp) tomutablemembers onDataSet. Each worker'sds_localhas the same per-worker, cross-replicate lifetime thethread_localhad, so persistence is unchanged — without emutls.Verification (clean builds:
rm src/*.o; CCACHE_DISABLE=1; --preclean, on this base)🤖 Generated with Claude Code