C++ search#210
Draft
ms609 wants to merge 766 commits into
Draft
Conversation
The parameter was added to SearchControl() in R/SearchControl.R (with roxygen @param) but the Rd file was not regenerated. All GHA R-CMD-check platforms reported 'Codoc mismatches from Rd file SearchControl.Rd'. Manually add pruneReinsertTbrMoves = 5L to \usage and its \item to \arguments to match the function signature.
Findings: Wagner 3.6-5.2x more expensive under Brazeau (highest outlier); ratchet only 1.1-1.3x overhead; rep rate near-identical to Fitch. wagnerStarts=3 explains thorough>default gap on 0-rep datasets (86t/3660c, 225t) — better starting topology matters when TBR convergence takes >30s. Fitch-tuned presets are appropriate for Brazeau; no cycle count changes warranted. strategies.md updated. T-290c confirmatory run launched.
The C++ bridge now expects three structured lists (searchControl, runtimeConfig, scoringConfig) instead of flat keyword arguments. The old do.call() with a merged flat list no longer matches the Rcpp function signature and would error at runtime. - searchControl: do.call(SearchControl, strategy) fills all defaults - runtimeConfig: maxReplicates/targetHits/maxSeconds/verbosity/nThreads - scoringConfig: EW defaults (concavity=-1 sentinel, xpiwe=FALSE) Removes T-291 from to-do.md.
- Add MBANK_BRAZEAU_SAMPLE and load_mbank_brazeau_sample() / has_meaningful_inapp() - Document two benchmark tracks (Fitch/Brazeau) with EW+IW weighting - Replace stale 2026-03-22 phase baselines (had Drift at 24-32%) with T-290b post-T-255 baselines (no drift, Ratchet 63-76% dominant) - Add Brazeau/Fitch per-phase cost ratios (Wagner 3.6-3.9x outlier) - Add wagnerStarts analytical conclusion (wagnerStarts=3 correct; benefit from starting topology in 0-rep regime, not ratchet cycles)
Stage 4 multi-dataset validation (Hamilton, 5 datasets 131-206t, 10 seeds): - 60s: mean delta +0.5 (neutral); project3701/146t regresses 12 steps; syab07205/206t: ZERO replicates complete (per-rep cost >= 60s budget) - 120s: mean delta -9.1 steps (but project3701 outlier drives this; others <= 6 steps). Replicate ratio 0.68 vs baseline. The 0-replicate failure at 206 tips / 60s budget is a showstopper for default preset use. PR remains available via SearchControl(). Set pruneReinsertCycles = 0L in large preset (was 5L, d=5%, MISSING).
…leaving); F-027/F-028/F-029 logged
In the TBR rerooting inner loop, evaluate 4 regraft candidates
simultaneously instead of one at a time. The 4 independent vroot_cache
row accesses are data-independent within each block iteration, so the
out-of-order CPU can serve them concurrently and hide L2 latency.
Changes:
- ts_fitch.h/cpp: add fitch_indirect_cached_flat_x4() (EW) and
fitch_na_indirect_cached_flat_x4() (NA) — process 4 vroot pointers per
block, exit when all 4 exceed cutoff (bitwise-AND combined test).
- ts_tbr.cpp: compute use_flat flag once per tbr_search call
(weight==1, no upweight_mask — normal EW search, not ratchet).
* SPR loop: use fitch_indirect_bounded_flat /
fitch_na_indirect_bounded_flat when use_flat (fewer CharBlock
struct dereferences).
* TBR rerooting inner loop: when use_flat && !use_iw, replace the
sequential ei loop with a batch-of-4 while loop. Collect up to 4
non-skipped candidates, call x4 batch function, update best from
all 4 results. Scalar fallback for trailing partial batches (< 4)
and for IW / ratchet-upweight paths.
IW and ratchet (upweight_mask) paths are unchanged.
All 28 test-ts-tbr-search + 23 constraint-small tests pass.
…ge-tree PR cost
The bottleneck in the previous PR implementation was full TBR convergence
on the full-size tree after every prune-reinsert cycle (step 6 in
prune_reinsert_search). At 180 tips this takes ~7s/cycle; with c=5 cycles
that is ~35s of full-tree TBR before the outer-loop TBR runs anyway.
Two new SearchControl() parameters:
pruneReinsertNni = TRUE -- use NNI instead of TBR for full-tree polish
(~5x cheaper at >=120 tips; outer-loop TBR
restores full local optimality afterwards)
pruneReinsertFullMoves = N -- limit full-tree TBR to N accepted moves
(0 = converge, backward compat default)
Both default to backward-compatible values (NNI=FALSE, fullMoves=0).
The large preset still has pruneReinsertCycles=0; re-enable once
benchmarked with NNI polish.
…rning When covr runs tests from its temp install path, spelling::spell_check_test() warns 'Failed to find package source directory'; this is harmless but was promoted to an error by error-on='warning' in the coverage GHA step. Muffle that specific warning with withCallingHandlers().
…eline 5 large-tree datasets (131-206 tips), 3 configs, 2 budgets, 10 seeds = 300 runs. Builds from feature/tbr-batch for pruneReinsertNni parameter.
… decision Earlier comment described Stage 1 benchmark showing -14.7 steps improvement, which was misleading — Stage 4 multi-dataset testing (131-206t) found the per-rep overhead was too high (0 replicates at 206t/60s), so pruneReinsertCycles was set to 0. Clarify the rationale and decision.
F-008: Fix constrained drift constraint staleness (T-279)
F-T-245: TBR 4-wide candidate batching (EW flat path)
Stage 4 results analysed (G-001): syab07205/206t starvation at 60s from full-TBR polish per PR cycle (~7s x 5 = 35s overhead). Agent E implemented pruneReinsertNni fix on feature/tbr-batch; Stage 5 scripts uploaded and submitted to Hamilton (SLURM 16622224, ~4-6h).
…f Stage 5 running
The sector-internal RAS rebuild (build_ras_sector, fired only at rasStarts>=2) scored candidate insertion edges with fitch_indirect_length_bounded = the union-of-finals proxy, the same undercounting formula fixed in wagner_tree. Replaced with compute_insertion_edge_sets + fitch_indirect_length_cached (exact per-character directional E[D]=combine(prelim[D],up[D]); the up-message carries the anchored HTU/rest-of-tree state). Default rasStarts=1 path is byte-identical (build_ras_sector not invoked). At rasStarts=3 on Zanol2014 rss-from-Wagner-T0: 1263->1262 (within +1 of target); rasStarts=6 equal; never regresses. Sector/SearchControl/driven tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
diag_tbr_falseconv_check.R: runs default TBR to convergence then enumerates the full canonical-TBR neighbourhood (TBRMoves) to count improving moves. Confirms on the post-vroot-fix build that good Wagner starts reach genuine optima (0 improving) while poor random starts strand with only 1-9 (vs the chip's pre-fix 40+). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
diag_sectras_sweep.R: rss-only from a Wagner T0, rasStarts in {1,3,6}, per dataset, with wall-clock. Shows rasStarts=3 closes the sectorial gap to +1 (Zanol 1269->1262, Zhu 631->625, Wortley 479->480) with no gain at 6, at ~3-5x rss cost. UNBOUNDED rss (maxSeconds=0); a time-matched comparison is still needed before any preset change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
diag_sectras_timematched.R: rss-only under a fixed wall-clock budget (rssRounds high so maxSeconds bounds). At 30s, rasStarts=3 beats rasStarts=1 by 5-8 steps (Zanol +1 vs +6, Zhu +0 vs +8) on 2 seeds each -- the deeper-per-sector tradeoff wins even time-matched, not just unbounded. Local timing indicative; full-search time-matched (Hamilton) is the gate before changing the auto-selected thorough preset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
TNT runs 3 RAS+TBR restarts per sector. On the post-(Wagner+build_ras_sector)-fix build this closes the sectorial gap from ~+7/+8 to ~+1 over the MPT and wins time-matched by 5-8 steps at a 30s rss budget (Zanol/Zhu), with no gain at 6 (diag_sectras_{sweep,timematched}.R). Added to the opt-in intensive preset only (never auto-selected); the auto-selected thorough preset keeps rasStarts=1 pending a full-search time-matched Hamilton gate. Functional check: strategy=intensive reaches 1262 (+1) on Zanol in 8s.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eness oracle Add TBRParams::unrooted (default off): after converging at one rooting, tbr_search re-roots at each tip and re-descends until a full tip-sweep finds no strict improvement, reaching a true unrooted-TBR local optimum. The rooted representation otherwise cannot break the root edge (nor, with the smaller-side clip filter, clip edges whose smaller side holds the root); parsimony length is root-invariant so re-rooting only changes the representation. Gated to the plain search (no sector/constraint/tabu/pool); exposed via ts_tbr_diagnostics(..., unrooted=). Default path unchanged. Validated with a small-tree differential oracle (dev/benchmarks/tbr_oracle.R): in-kernel single call reaches 0 canonical-improving (all_tbr/all_spr) neighbours at 12 and 16 tips; real-data 74-tip Zanol result is canonical-TBR-clean. 238 MaximizeParsimony + targeted TBR/sector/search tests pass. Context (post directional-vroot scoring fix 2b299e4): that fix resolved most apparent move-incompleteness (oracle 23/40 -> 9/60); the residual is purely the root-edge limitation. Reroot gain on Zanol is marginal (median ~3; 0-1 from Wagner starts) at ~6.5x per-call cost, and does not close the gap to TNT (basin/escape, separate). Kept opt-in pending micro-optimisation. Full analysis: dev/plans/2026-06-18-tbr-shared-start.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
diag_thorough_rasstarts_tm.R (local) + hamilton_thorough_rasstarts.R + SLURM wrapper. KEY RESULT (local, 60s, 2 seeds): full thorough reaches the optimum at rasStarts=1 on Zanol (+0) and Zhu (+0); rasStarts=3 adds nothing. The rss-only rasStarts win is REDUNDANT in the full pipeline (ratchet/drift/fuse/multi-replicate already escape). Hamilton grid kept ready for a larger-dataset / shorter-budget regime where rss efficiency might still matter. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ate) The full thorough/intensive pipeline reaches the optimum at rasStarts=1 on Zanol/Zhu at 60s; the rss-only rasStarts win is redundant once ratchet/drift/fuse/multi-replicate are in play. No demonstrated full-search benefit, so per the 'ship only if end-to-end improves' rule, revert to rasStarts=1. Kept a breadcrumb comment + the harnesses to revisit for larger datasets / shorter budgets. The build_ras_sector C++ correctness fix (commit 93071ca) stays -- it helps anyone setting rasStarts>1 manually and never regresses. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… bug Post-fix gap-panel re-measurement (diag_gap_panel_postfix.R, thorough, 60s, 3 seeds): Wortley 479 (-1), Zanol/Zhu/Giles +0 -- full search now reaches the MPT across the hard panel. Overturns the Phase 1-3/Challenge-2 'landscape/escape-bound floor, not frugality-bound' conclusion: the +1/+3 was substantially the union-of-finals insertion-cost bug (commits 2b299e4, 93071ca). Candidates-per-improvement reversed (Vinther 6.3x worse -> 0.44x). Closes the core-deficit (#26) and drift-for-+1 (#25, now moot) threads; race-to-target reached, leaving only the Hamilton wall-clock ratio open (#22). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
hamilton_timing.R + timing_hamilton.sh (run-only, reuses pre-built lib + staged 64-bit TNT). Times TreeSearch (default/thorough) vs representative TNT configs (mult basic, xmult default, xmult level 10) on identical 64-bit hardware; re-scores TNT trees via TreeLength. One dataset per job. TNT 1.6 64-bit staged at /nobackup/pjjg18/TreeSearch/tnt; PUL accepted (persists via ~/.passwordfile.tnt) so headless runs work with LD_LIBRARY_PATH=TNT-bin + TERM=xterm. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
library(TreeSearch, lib.loc=) does not put the lib on .libPaths, so library(TreeTools) and TreeSearch's own dependency loading failed with 'no package TreeTools'. Prepend .libPaths(c(TS_LIB, .libPaths())) instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the O(n_tip) physical-reroot sweep (badb73b) with a single-pass direct enumeration of the root-position-dependent moves the rooted clip loop skips, under the opt-in `unrooted` flag. Default path byte-identical. EW: three relaxations under do_reroot (nz/ns rerooting-loop skip, L872 smaller-side-clip filter, direct in-pass root-edge via try_root_edge_moves). Completeness oracle 0/N @ n=12/16/20. IW: two fixes make the indirect IW scan EXACT. 1. Directional scoring: route IW to compute_insertion_edge_sets / indirect_iw_length_cached (was the move-hiding union-of-finals approximation, left EW-gated by the parent's 2b299e4). 2. Clip-internal base: add_clip_internal_steps() adds the clipped subtree's internal Fitch homoplasy back into base_iw. spr_clip physically removes the subtree, so extract_char_steps over the clipped postorder omitted it — a PRE-EXISTING, UNIVERSAL under-count that corrupted cross-clip IW move ranking (masked by the exact full_rescore self-correct). Found via TS_IW_SCANCHK (predicted vs exact on every accept): 53 rooted / 75 unrooted mispredicts -> 0/0. Validation: IW oracle 3/60 -> 0/60; pure-IW scan exact; ~6.7x faster than physical-reroot IW; 149 IW tests pass; EW byte-identical; multi-start quality neutral. Also fixes rooted-IW move-selection for all users. NA: investigated, left at production baseline. Same base omission but the NA Pass-3 reads whole-tree uppass context -> attachment-dependent, mixed- sign residual -> cannot be made exact. Ground-truth oracle (tbr_oracle_na.R) shows BOTH direct and physical-reroot incomplete for NA+IW; NA needs exact-verify-at-convergence (deferred). X-internal gated `if (!has_na)`. Diagnostic env: TS_IW_SCANCHK (predicted-vs-exact scan check, all scorers). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The NA (inapplicable / Brazeau three-pass) indirect scan is only APPROXIMATE:
the divided+reconnect decomposition is not exact (the clipped subtree's
internal step count is attachment-dependent — Pass-3 reads whole-tree uppass
context), so the fast inner clip loop declares convergence while improving
moves remain. The ground-truth oracle (tbr_oracle_na.R, ts_fitch_score on real
inapplicable data) confirmed BOTH the direct scan AND the physical-reroot path
leave improving NA neighbours (Aria2015: direct EW 6/10, IW 8/10; Agnarsson EW
4/10) — so neither was a sound completeness mechanism for NA.
Add exact_verify_sweep(): at convergence, sweep the ENTIRE unrooted-TBR
neighbourhood (clip_node x {identity + fragment rerootings} x divided-tree
regraft edges) scoring each candidate EXACTLY via apply_tbr_move + full_rescore,
apply the first strict improver (first-improvement; the cheap approximate loop
re-climbs between calls), repeat until 0-improving = a true unrooted-TBR
optimum. Regraft edges are built directly from the unclipped tree (every
(parent[c], c) with c outside the clipped subtree, plus the merged (nz, ns)
edge), so apply_tbr_move re-clips exactly as the scan's accept path does — no
spr_clip churn. Gated on has_na at the convergence dispatch, so EW/IW (whose
scans are exact) keep try_root_edge_moves and are byte-identical.
Validation: NA oracle direct EW/IW 6-8/10 -> 0/6 COMPLETE (Aria2015); pure
EW/IW oracle unchanged (0/30); IW scan-check 0; 149 IW tests pass; regression
2 pre-existing EW failures only. Cost: ~0.13s/start on Aria2015 (35 tips),
~2.6x physical-reroot — modest (amortised by first-improvement + the cheap
inner loop). Scales O(n^3 x rescore) per convergence; large-tree optimisation
is a follow-up. unrooted is opt-in, so default searches are unaffected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Output of hamilton_timing.R (Hamilton, single-thread): per-dataset
time-to-optimum for TreeSearch {default,thorough} vs TNT
{mult-basic,xmult-default,xmult-level10}, 3 seeds each.
Post-Wagner-fix quality is at parity -- TS reaches the optimum on all
four datasets. But the wall-clock gap to TNT's thorough mode (xmult
level 10) is ~2x Wortley, ~3x Giles, ~14x Zhu, ~16x Zanol: an order of
magnitude, not the audit's pre-fix ~2x. TNT xmult-default is faster
still (0.1-0.3s) but unreliable (Zanol seeds 2-3 = 1262; Wortley = 481/482).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Six diagnostics from the TNT-parity investigation:
- diag_convergence_{ab,fidelity,tail,enum}.R: xmult-style convergence
early-stop via consensusStableReps. Score-safe (0 loss across 21 runs)
but returns an OVER-resolved consensus (cidArm2ref > cidFull2ref, false
support confidence) -- verdict = ship opt-in, NOT thorough-default.
- diag_treespace_{pool,sampling}.R: isolates poolMaxSize / sampling
effects on strict-consensus resolution (default 100 vs TNT hold 10000).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
All three scorers (EW, IW, NA) now reach guaranteed true unrooted-TBR optima by default. EW/IW via the in-pass root-edge check; NA via the exact-verify sweep at convergence. Gated out for sector/constrained/ tabu/pool sub-searches (state would be invalidated) so those are unaffected. Two pre-existing test failures (tbr<=spr, ratchet<=tbr) resolved as a side-effect: the rooted kernel was getting stuck above the unrooted optimum so TBR appeared no better than SPR. Adjust test-ts-sector-resolve.R: raise targetHits to 99 so both rasStarts configurations run the full two replicates; under the new default the better initial tree caused rasStarts=3 to terminate after one replicate (fewer total candidates) even though per-sector work was correctly higher. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
exact_verify_sweep runs a full O(n^3) unrooted-TBR neighbourhood sweep at every NA convergence to certify a true optimum. The same converged topologies are re-verified many times across a search (ratchet restore passes, per-cycle TBR polish), each repeating the entire sweep. Memoize FALSE (genuine-optimum) verdicts in a thread_local cache. The result is a pure function of (topology, dataset, weighting regime), so a cached FALSE stays valid until one of those changes. Key = hash(canonical child-pairs) XOR dataset-fingerprint XOR weight-fingerprint; the dataset-fingerprint alone is the clear-trigger, so base-regime entries survive perturbation excursions and are reused. The weight-fingerprint (per-block active_mask, upweight_mask, pattern_freq -- exactly the fields ts_ratchet.cpp's save_perturb_state snapshots) is essential, not optional: the ratchet mutates the weighting in place and runs NA TBR under both perturbed and base weights within one cycle (the default strategy's ZERO_ONLY mode zeroes active_mask every cycle). Without it, a base-regime "optimal" verdict would be reused during a perturbed pass, silently skipping the improving moves the ratchet exists to find -- an NA-search-quality regression the NA oracle (which never ratchets) cannot catch. Validated on Zanol2014 (74 tips, NA), default strategy, seed 1, 3 reps: cache-on score == cache-off (TS_EV_NOCACHE) score == 1315 with identical tree count -- the cache is behaviourally transparent (a hit returns the same FALSE the full sweep would) -- at 364.5s vs 416.1s wall (-12.4%). TBR regression suite: 0 failures/errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rds) The comment claimed the hash is "root-position-independent (any rooting of the same unrooted topology produces the same hash)". That is false and, worse, backwards: sorting each child pair canonicalizes only the left/right child order WITHIN a node, not across rerootings. Different rootings renumber the internal nodes and flip parent/child directions, so they hash differently (verified: two rootings of one unrooted tree differ at 55/73 internal nodes). That root-DEPENDENCE is exactly what makes the memoization cache correct: exact_verify_sweep is itself root-dependent (it skips root-child clips, leaving a residual completeness gap — task #19), so each rooting has a different neighbourhood and must be cached separately. The previous wording invited a future "canonicalize to root-independent" optimization that would cause cross-rooting cache hits to suppress improvers one rooting finds and another misses. Comment-only; no behaviour change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The exact_verify_sweep optimum cache memoizes "genuine NA optimum" verdicts, which are valid only under the current weighting regime. The ratchet mutates active_mask/upweight_mask/pattern_freq in place mid- search, so a base-regime verdict leaking into a perturbed pass silently skips the improvers the ratchet exists to find. That regression is invisible to the NA oracle (never ratchets) and to final scores (always recomputed), so guard it deterministically at the key level: - extract the key into one shared helper, exact_verify_cache_key(), called by BOTH the cache and the probe, so dropping a term from the key line is caught -- not just a broken fingerprint function; - ts_ev_cache_key_probe export returns the exact key; its flags reproduce the three ways the ratchet changes the regime (zero_active = ZERO_ONLY, the default NA strategy; set_upweight; bump_pattern_freq); - test-ts-na-evcache.R asserts the composite key changes for each regime field AND for topology/dataset, and is deterministic; - TS_EV_AUDIT (off by default): on a cache hit, re-run the full sweep and abort if it finds an improver -- a live tripwire for contamination. Mutation-validated: dropping `^ weight_fingerprint` fails exactly the three regime assertions (topology/dataset/determinism still pass). Default path byte-identical; TBR regression suite 0 failures. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The exact directional insertion edge-set computation (the EW per-clip
hotspot, ~27% of full-EW CPU per VTune) reallocated and zero-filled its
edge_set and up-message buffers on every clip. Both are fully written
before any read for every in-tree non-root node (the only slots any
reader touches), so the zero-fill was pure waste.
Hoist `up` and `pre` to caller-owned scratch passed by reference (NOT
thread_local — function-local per call, so per-thread safe) and replace
the per-call assign(N,0) with a non-zeroing size-ensure. A debug-only
(#ifndef NDEBUG) write-before-read guard records every written non-root
slot and asserts completeness against the in-tree node set, so a stale
slot can never be read. Threaded through all three call sites
(tbr_search, build_ras_sector, wagner_tree).
Verified:
- EW bit-identical: score AND candidates_evaluated exact over
{Wortley2006,Zhu2013,Zanol2014} x seeds{1,2}, reps3.
- NA single-threaded bit-identical: Vinther2008 x seeds{1..4}, exact.
- 276 kernel search tests pass (tbr/wagner/sector/ratchet/drift/fitch).
- Wall: -16.4% on Zanol2014 (largest), -9.4% sum on heavy runs; the
saving scales with O(n_node * total_words), as predicted.
NB an unrelated pre-existing crash exists in the parallel (nThreads>=2)
NA path (Vinther2008): it reproduces on the unmodified baseline, is a
timing-dependent race, and is independent of this bit-identical change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ergence exact_verify_sweep (the NA convergence certifier) clips every internal node EXCEPT root-children (the nx==n_tip guard), so the one unrooted edge the display root sits on (cL-cR) was never enumerated. On poor NA starts that left a root-edge improver undetected: the kernel declared convergence above the true unrooted-TBR optimum (dev/benchmarks/tbr_oracle_na.R: 2/20 on Zanol2014, e.g. start #14 stalled at 1323 with a 1320 neighbour). Fix: enumerate that one edge exactly at the optimum exit, reusing the already- tested try_root_edge_moves_rescore (the same apply+full_rescore path IW uses at convergence). The clip loop covers the 2n-4 non-root edges; the root-edge check covers the 1 remaining edge; together they certify all 2n-3 unrooted edges, so a FALSE return — and the memoized optimum — now means a true unrooted-TBR optimum. The change strictly enlarges the certified neighbourhood and only ever applies strict improvers, so it can only remove "converged-with-improver" failures and cannot worsen any score. Validation: Zanol2014 start #14 now reaches 1320 with no improver; small-tree NA oracle 0/100; IW unaffected (already complete, 0/60); TBR regression suite 0 failures. Adds dev/benchmarks/tbr_oracle_na_small.R (fast high-N oracle) and tests/testthat/test-ts-na-complete.R (Tier-3 regression pinning start #14). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Records why the exact O(1) "EW additive" directional scan is provably unavailable for NA (the per-node Pass-3 step is not 2-local — it reads the global applicable- region resolution; code already gates NA off the additive path), why a richer fixed-size-message DP is plausible but research-grade (the one directional NA message that exists, fitch_na_indirect_length, is deliberately approximate), and recommends the cheaper lever for #18: incremental EXACT rescore (pruning via the approximate scan + a localised Pass-3 delta on top of fitch_na_dirty_*), validated against the now-complete exact_verify_sweep oracle. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…build Rogue hard-imports Rfast, which compiles from source under the r-hub gcc-asan container (~30 min). All Rogue usage in TreeSearch is already requireNamespace-guarded (two vignettes, Shiny consensus module), so it skips cleanly when absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…253) The parallel NA search (nThreads>=2) intermittently aborted with STATUS_HEAP_CORRUPTION. Root cause: the per-thread scratch in the TBR kernel (ts_tbr.cpp, ts_fitch.cpp) and exact_verify_sweep's optimum cache were function-local `static thread_local`. On MinGW these resolve via emutls, whose thread_local teardown across std::thread spawn/exit corrupts the heap. EW is unaffected (light TLS); the NA path trips it because exact_verify adds a thread_local unordered_set plus more scratch. Fix: convert all worker-reachable scratch to plain function-locals (each worker owns its call frame -> per-thread-safe; per-clip realloc measured <=1.6% on 88-tip data, ~0% typical). Move exact_verify_sweep's optimum memoization to mutable members on DataSet so it keeps the same per-worker, cross-replicate persistence the thread_local had, without emutls. Verified on clean builds (rm src/*.o; CCACHE_DISABLE=1; --preclean): parallel NA survives 120/120 (was iter ~4-8), EW 200/200, serial scores bit-identical, NA perf 4.15s (cache intact, vs 5.81s cache-disabled). Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…ot loops Bank the validated micro-lever sweep from branch claude/tbr-microlevers (task #48). All changes are BYTE-IDENTICAL: score + candidates_evaluated unchanged on Wortley2006/Zhu2013/Zanol2014 x seed{1,2} (verify_l1.R, 6/6). THE WIN — a diagnostic std::getenv("TS_REVERT_CHECK") left in the per-clip teardown (~100k+ calls/search) was costing 13-19% of EW MaximizeParsimony wall on Windows/ucrt, where getenv is us-scale (locked env-block scan), not sub-ns. Hoisted to a per-call bool. Quiet-machine same-seed paired A/B: Zanol -13.2% (20/20, p=0), Zhu -19.1% (12/12, p=0); 3-way attribution proves the getenv hoist alone is the entire win. Also folded in (both byte-identical, both ~0 measured effect, kept as exact cleanups): - cutoff hoist: maintain the EW/NA bail cutoff across the clip, recompute only on improvement (+0.00%, attribution-proven). - kept_ei: precompute sub_edge-invariant reroot skip predicates once per clip (marginal/wash even at Zanol-1261; droppable). Caveat: getenv magnitude is env-size + platform dependent (Windows/ucrt large; Linux cheaper) — Hamilton/Linux confirmation owed. Byte-identical and strictly removes ~100k getenv/search regardless. Detail: dev/profiling/findings.md T-P5n + dev/profiling/tbr-microlever-sweep.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Manual testing underway; shiny app in particular has some usability issues.