Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
766 commits
Select commit Hold shift + click to select a range
fdf2567
docs: add pruneReinsertTbrMoves to SearchControl.Rd (codoc fix)
ms609 Mar 28, 2026
ab8b415
chore: E-003 codoc fix logged; agent-E.md updated
ms609 Mar 28, 2026
c818923
chore: S-COORD-41 (codoc fix; T-289 Stage 4 running; PR status)
ms609 Mar 28, 2026
35f8abc
chore: T-290 complete — Brazeau phase profiling + wagnerStarts analysis
ms609 Mar 28, 2026
f1ed5df
fix(T-291): update benchmark_run() to new ts_driven_search interface
ms609 Mar 28, 2026
d07f338
chore: T-291 complete; to-do.md updated
ms609 Mar 28, 2026
1387e62
chore: S-COORD-41 update — PR #210 CI status post-codoc-fix
ms609 Mar 28, 2026
c62cb69
docs: update AGENTS.md with T-290b Brazeau phase profiling findings
ms609 Mar 28, 2026
f14cad0
chore: E-005 S-RED ts_strategy.h + ts_temper complete (no bugs)
ms609 Mar 28, 2026
31d2c95
chore: re-remove T-290 (completed G-001; clobbered by concurrent d07f…
ms609 Mar 28, 2026
7469852
fix(T-289): disable pruneReinsert in large preset
ms609 Mar 28, 2026
8254c91
chore: T-289 complete; agent-E IDLE
ms609 Mar 28, 2026
2cef34d
chore: S-COORD-42; T-269 complete (no benefit from fine-grained inter…
ms609 Mar 28, 2026
927070a
chore: S-RED focus 28 complete — ts_mc_fitch, ts_tabu, ts_prune_reins…
ms609 Mar 28, 2026
038e00a
perf(T-245): TBR 4-wide candidate batch + flat-variant switch
ms609 Mar 28, 2026
34901c4
chore: agent-F T-245 ASSIGNED; GHA 23690208221 dispatched
ms609 Mar 28, 2026
09c9346
feat(T-289b): pruneReinsertNni + pruneReinsertFullMoves to reduce lar…
ms609 Mar 28, 2026
7943e60
fix: suppress covr false-positive from spell_check_test source-dir wa…
ms609 Mar 28, 2026
7ac5a7d
chore: S-COORD round 43 — spelling fix; T-245 parked; PR #210 re-trig…
ms609 Mar 28, 2026
aa3f16e
chore(T-289f): Stage 5 benchmark — PR NNI polish vs TBR polish vs bas…
ms609 Mar 28, 2026
f1e9c4c
docs: update large preset PR comment to reflect T-289 Stage 4 disable…
ms609 Mar 28, 2026
80ece4f
chore: S-COORD round 44 — T-245 GHA PASS; S-RED focus 29 clean; PR pe…
ms609 Mar 28, 2026
d67bed2
chore: T-245 status → PR #238; update S-COORD/S-PR notes
ms609 Mar 28, 2026
f6318da
chore: agent-e PARKED — T-289f NNI polish done, awaiting GHA + Hamilton
ms609 Mar 28, 2026
f9e59b4
chore: agent-F IDLE after T-245 + spelling fix + S-RED focus 29
ms609 Mar 28, 2026
93d000a
Merge pull request #237 from ms609/feature/drift-constraint-fix
ms609 Mar 28, 2026
7207e0b
Merge pull request #238 from ms609/feature/tbr-batch
ms609 Mar 28, 2026
c5d92af
chore: agent-e T-289f Stage 5 dispatched (SLURM 16622224)
ms609 Mar 28, 2026
5f047c9
chore: S-COORD round 45 — PRs #237+#238 merged; agent-G active; T-289…
ms609 Mar 28, 2026
8283afb
chore: S-RED focus 30 clean (ts_drift + ts_fitch/tbr post-merge); age…
ms609 Mar 28, 2026
2784432
fix(T-289f): update Hamilton script to use cpp-search — feature/tbr-b…
ms609 Mar 28, 2026
16842b2
chore: agent-G T-289f diagnosis + T-290c complete; resubmit pending
ms609 Mar 28, 2026
da8d24e
docs: update AGENTS.md wagnerStarts section with T-290c empirical fin…
ms609 Mar 28, 2026
9e79ec3
chore: S-RED focus 31 — ts_prune_reinsert.cpp; filed G-006 (nni_full …
ms609 Mar 28, 2026
6a7ded4
chore: T-289f Stage 5 analysis — pr_nni wins 7/10 EB, not preset-enabled
ms609 Mar 29, 2026
7aeff18
docs: add seed count benchmarking methodology to strategies.md
ms609 Mar 29, 2026
a159311
feat(diag): Phase 1 TBR clip-ordering diagnostic instrumentation
ms609 Mar 29, 2026
9c8a359
chore: record TBR clip-ordering experiment outcome (PA-001)
ms609 Mar 29, 2026
d0df608
chore: record XSS↔TBR cycling experiment results (PA-002)
ms609 Mar 29, 2026
4a549eb
feat(T-289f): enable pruneReinsertCycles=5 + NNI polish in large preset
ms609 Mar 29, 2026
70a3bd4
fix(G-006): skip NNI polish when constraints active in prune_reinsert…
ms609 Mar 29, 2026
589d27d
chore: add remote-jobs.md for tracking async Hamilton/GHA jobs
ms609 Mar 29, 2026
6aeac51
chore: record targeted post-clip sector search results (PA-003)
ms609 Mar 29, 2026
5a060b9
feat: TBR clip ordering strategies (Phase 2 — complete)
ms609 Mar 29, 2026
ca8f4f0
docs: add @param clipOrder to SearchControl; fix stale \usage block
ms609 Mar 29, 2026
3cf476d
chore: agent F — F-030 complete (PR #239 clip-ordering phase 2)
ms609 Mar 29, 2026
72fce2e
Merge branch 'cpp-search' into feature/weighted-clip-order
ms609 Mar 29, 2026
6972444
Merge pull request #239 from ms609/feature/weighted-clip-order
ms609 Mar 29, 2026
14ff3f9
chore: agent F — IDLE; TS-WeightClip worktree deregistered
ms609 Mar 29, 2026
f16e44c
Concordance doc
ms609 Apr 9, 2026
9b23311
Goloboff2021 [-b]
ms609 Apr 9, 2026
5d987a1
Goloboff2021-b
ms609 Apr 9, 2026
066fb14
Concordance refs
ms609 Apr 9, 2026
f675172
Handle DNA in QuartetConcordance
ms609 Apr 10, 2026
b99265d
ConcordanceTable margins
ms609 Apr 17, 2026
9ac78ba
characterwise
ms609 Apr 23, 2026
571ed05
Update .Rbuildignore
ms609 Apr 23, 2026
a4627d2
User issues
ms609 May 4, 2026
0b28d9e
memchex
ms609 May 6, 2026
a941956
Simplify AGENTS.md
ms609 May 6, 2026
d009425
a. triage
ms609 May 6, 2026
89fbd4e
v6
ms609 May 6, 2026
319194e
Fix T-293/T-300: guard concordance reactive against taxon mismatch
ms609 May 7, 2026
9a6ac50
Dynamic limit for perturbStopFactor
ms609 May 7, 2026
b189017
roxygen2 v8
ms609 May 7, 2026
1bd54d1
testthat/edition: 2
ms609 May 7, 2026
db92b71
Avoid full_rescore unless needed
ms609 May 7, 2026
17f49c3
Deglitch UI
ms609 May 7, 2026
73cb644
rm flicker
ms609 May 7, 2026
ef3cefa
Clarify intent
ms609 May 7, 2026
dbf4787
Reorder
ms609 May 7, 2026
190ec25
Top/right margins on ConcordanceTable
ms609 May 7, 2026
f006918
mod-consensus rogue floor
ms609 May 8, 2026
08ddc58
Merge branch 'cpp-search' of https://github.com/ms609/TreeSearch into…
ms609 May 8, 2026
c000e9f
/dispatch implementation
ms609 May 8, 2026
4e83fa9
Update dispatch.sh
ms609 May 8, 2026
055f5c2
rm positai old files (goods extracted)
ms609 May 8, 2026
8233902
Merge branch 'main' into cpp-search
ms609 May 11, 2026
bdb5454
Ignore *.sh
ms609 May 12, 2026
17c71bb
avoid with_pdf
ms609 May 12, 2026
b290809
python v
ms609 May 12, 2026
790278d
feat(T-301): enable testthat edition 3; fix expect_equivalent and con…
ms609 May 12, 2026
1ca3466
debug /dispatch
ms609 May 12, 2026
cc95479
restrict worktrees to ../worktrees/ and protect main checkout
ms609 May 12, 2026
91f36ec
reap: proactively poll GHA when nothing is ETA-ready
ms609 May 12, 2026
0a41b09
fix(T-301): resolve testthat edition 3 failures (64 → expected 0)
ms609 May 12, 2026
8b0884f
dispatch: replace with thin stubs pointing to shared skill
ms609 May 12, 2026
0f75d2c
fix(T-301): use unrooted swappers in Ratchet test for optimal tree di…
ms609 May 12, 2026
a25f181
chore: move T-301 to completed-tasks
ms609 May 12, 2026
fbc1a8c
fix(test-CustomSearch): remove undefined referenceTree assertions
ms609 May 12, 2026
f58eef3
Merge pull request #241 from ms609/feature/testthat-3
ms609 May 12, 2026
c5e8db7
memcheck needs
ms609 May 12, 2026
e2258eb
Concordance testing tidy
ms609 May 12, 2026
c5091a8
Shiny bugfixes
ms609 May 12, 2026
87f1abe
coord(d5): T-302 complete — LengthAdded negative delta fix queued GHA…
ms609 May 13, 2026
72dc274
coord(T-298): update status to PR #242 (d8)
ms609 May 13, 2026
93ec0ec
red-team log
ms609 May 13, 2026
72eff08
progress reporting
ms609 May 13, 2026
a204542
Accept fractional per-character weights via attr(dataset, "weight")
ms609 May 17, 2026
a573450
Add fractional-weights.R to DESCRIPTION Collate field
ms609 May 17, 2026
4eef5f9
Guard against int overflow when sum(weights) > INT_MAX
ms609 May 17, 2026
6516528
Fix R CMD check warnings: declare withr, add Rcpp to WORDLIST
ms609 May 17, 2026
ea44ea1
Replace withr::with_options() with base options()/on.exit()
ms609 May 17, 2026
cd0ba6e
Add missing #include <climits> for INT_MAX in ts_resample.cpp
ms609 May 17, 2026
3a02efb
lift 256-draw cap in MaddisonSlatkin ValidDrawsCache
ms609 May 18, 2026
cdbe8e3
ci(ASan): drop rlang continue-on-error shim
ms609 May 18, 2026
96e1c43
docs(pkgdown): index inapplicable and search-algorithm vignettes
ms609 May 18, 2026
f509c60
seed
ms609 May 18, 2026
e2be4e6
docs(vignette): drop phangorn from inapplicable example
ms609 May 18, 2026
77ac344
docs(vignette): gate Rogue chunks in tree-search
ms609 May 18, 2026
99e2467
docs(vignette): gate protoclust in tree-space
ms609 May 18, 2026
86dcea9
docs(vignette): gate Rogue calls in tree-space
ms609 May 18, 2026
e2e4a6f
1260 for 1/X fractions
ms609 May 18, 2026
0094db2
Merge pull request #243 from ms609/autopart/fractional-weights
ms609 May 18, 2026
3c9daa8
+shinylive
ms609 May 18, 2026
73ec750
optimize safety
ms609 May 18, 2026
b9aabb4
w_mult safety
ms609 May 18, 2026
7a996d6
sssshhh
ms609 May 18, 2026
995a13a
Merge branch 'main' into cpp-search
ms609 May 18, 2026
fde200b
/profile init
ms609 May 18, 2026
b186e80
fix(progress): guard R_FlushConsole behind R_Interactive
ms609 May 18, 2026
44d929a
perf(sector): copy flat_blocks and all_weight_one in build_reduced_da…
ms609 May 18, 2026
1e3fc9a
perf(tbr): incremental rescore for SPR accept moves (T-300)
ms609 May 18, 2026
b7303ee
revert(T-300): remove broken incremental rescore — diff=-3 + stack sm…
ms609 May 19, 2026
2832c06
test: fix data() isolation bug in stopping and xpiwe tests
ms609 May 19, 2026
c504ea8
fix(drift): restore topology before build_postorder on RFD re-apply f…
ms609 May 19, 2026
f531bbc
perf(T-300): dirty-set incremental rescore for SPR accept
ms609 May 19, 2026
b67db1a
chore(profile): round 2 — T-300 baseline (S-PROF area #4)
ms609 May 19, 2026
9b9b170
docs: update RESUME.md post T-300 (b67db1a1)
ms609 May 19, 2026
2be8228
test(T-300): DEBUG_NNI_RESCORE cross-check, EW-only
ms609 May 19, 2026
3df9088
fix(nni): correct IW score computation in incremental rescore
ms609 May 19, 2026
1bd1346
PaintCharacters() draft
ms609 May 19, 2026
44a4ebe
chore(T-300): remove DEBUG_RESCORE + DEBUG_NNI_RESCORE scaffolding
ms609 May 19, 2026
2148483
Concordance paint-swatch
ms609 May 19, 2026
014ccde
feat(T-300 NA): dirty-set rescore for NA datasets
ms609 May 19, 2026
df7dd54
docs(PaintCharacters): add ConcordanceTable example call
ms609 May 19, 2026
012b6d2
Remotes: ms609/TreeTools,
ms609 May 19, 2026
2b6b6be
fix(progress): replace non-API R_Interactive with portable isatty()
ms609 May 19, 2026
b2652a0
Spelling
ms609 May 19, 2026
d24cd7a
T-302: fix LengthAdded negative delta (qmApp scalar) (#244)
ms609 May 19, 2026
7185b54
spell
ms609 May 19, 2026
5b210fd
chore(T-300 NA): remove DEBUG_NA_RESCORE scaffolding
ms609 May 19, 2026
221a38a
docs(T-300): record Zhu2013 NA dirty-set perf baseline
ms609 May 19, 2026
f281356
test(nni): pin IW score returned by nni_search to independent recompu…
ms609 May 19, 2026
dcdd401
feat(ls): least-squares distance scoring for the C++ search kernel (#…
ms609 Jun 1, 2026
7ad84be
-stdout
ms609 Jun 1, 2026
e8b318c
PolEscapa ambiguity fix
ms609 Jun 1, 2026
777ef49
Seed
ms609 Jun 1, 2026
63cd137
Update red-team.md
ms609 Jun 1, 2026
ca5d8e3
Update to-do.md
ms609 Jun 1, 2026
541eff3
redoc
ms609 Jun 1, 2026
61d275f
merge to 2.0.0
ms609 Jun 1, 2026
ce93d64
feat: WideSample solves the MMDP via MaxMin (quality tiers, matrix-free)
ms609 Jun 5, 2026
5094abf
→Gonzalez()
ms609 Jun 8, 2026
e5ff294
Sectors?
ms609 Jun 8, 2026
810913a
Simplify news
ms609 Jun 11, 2026
437ca66
WIP
ms609 Jun 11, 2026
828d8e6
Updated MaxMin
ms609 Jun 11, 2026
05c6552
simplify setting
ms609 Jun 11, 2026
e27784f
No need to sort - original order is arbitrary
ms609 Jun 11, 2026
94c7b08
coord: T-303 → PR #247 (t303)
ms609 Jun 12, 2026
9828c52
coord: T-304 -> PR #248 (t304)
ms609 Jun 15, 2026
7a7801f
MaxMin param naming
ms609 Jun 15, 2026
da59271
T-305
ms609 Jun 15, 2026
07527b2
coord: T-306 -> PR #249 (t306)
ms609 Jun 15, 2026
6fe7d94
m→k
ms609 Jun 15, 2026
84effc2
Suggests: highs
ms609 Jun 15, 2026
4607d21
MaxMin→Suggests
ms609 Jun 15, 2026
9c4e3bb
Doc (spelling)
ms609 Jun 15, 2026
589c182
Update WideSample.Rd
ms609 Jun 15, 2026
1a2707f
Simplify
ms609 Jun 15, 2026
7a4d0a8
perf(quartet_concordance): hoist buffer resize outside split loop (#242)
ms609 Jun 15, 2026
0082ea8
dup keys
ms609 Jun 15, 2026
b097bba
Merge branch 'cpp-search' of https://github.com/ms609/TreeSearch into…
ms609 Jun 15, 2026
627a62c
T-303: sector heuristic on HSJ/XFORM — guard verification + regressio…
ms609 Jun 15, 2026
3cb59be
missed `progress` arg
ms609 Jun 15, 2026
1bf3f90
fix(tests): silence multi-warning leakage under testthat edition 3
ms609 Jun 15, 2026
bb42134
T-304: regression test for T-300 dirty-set incremental rescore (#248)
ms609 Jun 15, 2026
317c1e4
HSJ: make scoring invariant to phyDat level ordering (T-307 primary +…
ms609 Jun 15, 2026
a15d740
T-306: gate HSJ/XFORM SPR/NNI accept-paths to full_rescore (#249)
ms609 Jun 15, 2026
841eead
hgihs workaround for R4.1
ms609 Jun 15, 2026
55e304b
Red-team CRAN hardening: T-309–T-315 + CRAN gates + UB (stable surfac…
ms609 Jun 16, 2026
9876fc6
Update red team / profiling structure
ms609 Jun 16, 2026
3160397
Ignore temp libs
ms609 Jun 16, 2026
70cba13
No maxmin prog
ms609 Jun 16, 2026
e4c67a6
VpHashSet
ms609 Jun 16, 2026
6888b00
Completed-task log
ms609 Jun 16, 2026
7944e10
Update agent-brief.md
ms609 Jun 16, 2026
ebda9ea
Update findings.md
ms609 Jun 16, 2026
837eccf
profiling structs
ms609 Jun 16, 2026
95d93ba
+Segoe UI fallback
ms609 Jun 16, 2026
f0e7c70
red-team area 8: gate 2 Tier-2 tests on CRAN; file T-322
ms609 Jun 16, 2026
d93b0a3
fix T-322: pass min_steps to wagner NA+IW cross-check, matching produ…
ms609 Jun 16, 2026
8730845
red-team area 9: fix WGN-DUP + POL-QM-EMPTY (Wagner/PolEscapa input c…
ms609 Jun 16, 2026
a3ec4cf
Driven search: stallEscalateFactor option + TBR kernel speedups
ms609 Jun 16, 2026
15a4ac0
red-team area 9: file T-323 (kernel OOB) + T-324 (silent retry); reco…
ms609 Jun 16, 2026
2be3912
red-team: refute area-9 high-sev signal (TreeLength NA+IW ≠ kernel)
ms609 Jun 16, 2026
be63625
Phase 0: instrument candidates-examined + TNT head-to-head harness
ms609 Jun 16, 2026
0818ad5
Fast-iteration tooling + Phase 0c/1 baseline results
ms609 Jun 16, 2026
0856a32
Phase 2: lever sweeps + conclusion (EW-Fitch gap at parameter-tuning …
ms609 Jun 16, 2026
28cd84c
Add opt-in strategy = "intensive" preset (thorough + extra Wagner sta…
ms609 Jun 16, 2026
b46c9ce
Phase 3: structural search rewrite ruled out (gap at floor); pivot to…
ms609 Jun 16, 2026
73c1879
Phase 3 plan: correct stale per-candidate-cost figures (fill already …
ms609 Jun 16, 2026
73fb7b8
Add TNT 1.6 settings survey (TTT across 14 configs x 6 datasets)
ms609 Jun 17, 2026
935b165
Add scaling survey extension for MorphoBank datasets (100-205t)
ms609 Jun 17, 2026
a33205c
Survey results
ms609 Jun 17, 2026
c641b68
Add TNT 1.6 scaling survey results (100-205t MorphoBank datasets)
ms609 Jun 17, 2026
d5ea921
Add .tnt-survey/ to .gitignore
ms609 Jun 17, 2026
ec11830
Clarify 205t failure modes in scaling survey report
ms609 Jun 17, 2026
8347e8b
Remove accidentally-tracked .tnt-survey/ staging files
ms609 Jun 17, 2026
6db7f2f
fix(driven): make ratchetCycles = 0 truly disable ratchet
ms609 Jun 17, 2026
2dc94f7
feat(sector): TNT-faithful sectorial-search levers (RSS/XSS)
ms609 Jun 18, 2026
832f56b
debug(tbr): add TS_REVERT_CHECK clip-undo invariant diagnostic
ms609 Jun 18, 2026
d44f33c
dev(profiling): round-4 notes, VTune Makevars, ignore profiling artif…
ms609 Jun 18, 2026
fbf8de3
chore(gitignore): ignore regenerable dev/benchmarks outputs
ms609 Jun 18, 2026
1eaf938
test(sector): add sector-resolve regression test
ms609 Jun 18, 2026
238661c
dev(benchmarks): TNT-parity investigation scripts, plans, and t0 fixt…
ms609 Jun 18, 2026
2b299e4
fix(wagner): replace union-of-finals with exact directional edge set
ms609 Jun 18, 2026
bf5b954
Merge origin/cpp-search (TNT scaling survey) into local cpp-search (W…
ms609 Jun 18, 2026
10389fe
docs(plan): mark Wagner insertion-cost fix shipped on cpp-search (sca…
ms609 Jun 18, 2026
93071ca
fix(sector): exact directional edge-set scoring in build_ras_sector
ms609 Jun 18, 2026
c02fb0b
dev(benchmarks): TBR false-convergence probe (good vs poor starts)
ms609 Jun 18, 2026
69287c0
dev(benchmarks): multi-dataset rss x rasStarts sweep with timing
ms609 Jun 18, 2026
b5509be
dev(benchmarks): time-matched rasStarts=1 vs 3 (rss phase)
ms609 Jun 18, 2026
e69765f
feat(search): rasStarts=3 in the opt-in intensive preset (TNT-faithful)
ms609 Jun 18, 2026
badb73b
feat(tbr): opt-in true unrooted TBR (reroot-at-convergence) + complet…
ms609 Jun 18, 2026
1dea229
dev(benchmarks): full-search time-matched rasStarts gate (thorough)
ms609 Jun 18, 2026
542da76
revert(search): drop rasStarts=3 from intensive preset (full-search g…
ms609 Jun 18, 2026
72e1277
docs(plan): Phase 4 — EW-Fitch score gap CLOSED; the floor was a cost…
ms609 Jun 18, 2026
b96e281
dev(benchmarks): Hamilton TS-vs-TNT per-dataset wall-clock timing
ms609 Jun 18, 2026
927b6eb
fix(bench): hamilton_timing.R sets .libPaths so TreeTools resolves
ms609 Jun 18, 2026
cfac132
feat(tbr): direct in-pass unrooted TBR — exact & complete for EW + IW
ms609 Jun 18, 2026
199c748
Merge branch 'cpp-search' into claude/competent-chaum-6ecb56
ms609 Jun 18, 2026
b7c0cab
feat(tbr): exact-verify-at-convergence — complete unrooted TBR for NA
ms609 Jun 18, 2026
fade85c
dev(benchmarks): post-fix TS-vs-TNT wall-clock timing results
ms609 Jun 18, 2026
3c40c80
dev(benchmarks): convergence-stop + treespace-sampling diagnostics
ms609 Jun 18, 2026
25e35be
feat(tbr): unrooted=TRUE becomes the default
ms609 Jun 18, 2026
02e88aa
perf(tbr): memoize exact_verify_sweep optima, keyed by weighting regime
ms609 Jun 18, 2026
8494d18
docs(tbr): correct tree_topo_hash root-dependence comment (was backwa…
ms609 Jun 18, 2026
78b7414
test(tbr): regression guard for exact_verify cache regime-keying
ms609 Jun 19, 2026
00d73d6
perf(tbr): skip per-clip zero-fill in compute_insertion_edge_sets
ms609 Jun 19, 2026
df5a15f
feat(tbr): complete NA unrooted-TBR — enumerate the root edge at conv…
ms609 Jun 19, 2026
1b16f0a
docs(tbr): NA cheap-directional-scoring feasibility analysis (task #18)
ms609 Jun 19, 2026
da0f203
ci(asan): exclude Rogue from ASAN dep install to avoid ~30-min Rfast …
ms609 Jun 19, 2026
d6fa512
fix(parallel): drop MinGW emutls thread_local on the worker NA path (…
ms609 Jun 20, 2026
b18508b
Acronym syntax / redoc
ms609 Jun 20, 2026
e44d786
stale / miscommitted
ms609 Jun 20, 2026
c930533
agent-*
ms609 Jun 20, 2026
beb5213
perf(tbr): hoist per-clip getenv/invariant checks out of tbr_search h…
ms609 Jun 20, 2026
f9ca332
Merge remote-tracking branch 'origin/cpp-search' into cpp-search
ms609 Jun 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
198 changes: 198 additions & 0 deletions .AGENTS/memory/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
# Architecture Reference

Load this when: editing `src/ts_*.cpp`/`.h`, adding Rcpp exports, reading
the R-level API, or reviewing design decisions.

---

## R-level API

| Function | Engine | Purpose |
|----------|--------|---------|
| `MaximizeParsimony()` | C++ driven search | Primary search (EW, IW, profile, constraints) |
| `Morphy()` | R-loop + MorphyLib | Legacy search (custom stopping, per-iteration callbacks) |
| `MaximizeParsimony2()` | — | Deprecated alias for `MaximizeParsimony()` |
| `Resample()` | C++ | Jackknife/bootstrap resampling |
| `SuccessiveApproximations()` | C++ | Successive approximations weighting |
| `TreeLength()` | C++ `ts_fitch_score` | Score one or more trees |
| `FastCharacterLength()` | C++ `ts_char_steps` | Per-character step counts |
| `AdditionTree()` | C++ `ts_wagner_tree` | Wagner tree construction |
| `RandomTreeScore()` | C++ (phyDat) or MorphyLib (morphyPtr) | Score a random tree |
| `TaxonInfluence()` | C++ via `MaximizeParsimony()` | Per-taxon search |
| `SearchControl()` | — | Expert parameter constructor for `MaximizeParsimony()` |
| `ParsSim()` | Pure R | Simulate datasets under parsimony (EW/IW/profile) |

`MaximizeParsimony()` has a backward-compatibility shim: passing old
Morphy-style parameters (`ratchIter`, `tbrIter`, etc.) triggers a deprecation
warning and delegates to `Morphy()`. Scheduled for removal in 2028.

---

## C++ module map

| Module | Header/Source | Purpose |
|--------|--------------|---------|
| Fitch scoring | `ts_fitch.h/.cpp` | Downpass, uppass, incremental, indirect |
| NA scoring | `ts_fitch_na.h` | Three-pass inapplicable algorithm (Brazeau et al. 2019) |
| NA incremental | `ts_fitch_na_incr.h` | Incremental NA-aware scoring for TBR/drift |
| SIMD | `ts_simd.h` | SSE2/NEON portability layer for bit-parallel ops |
| Data | `ts_data.h/.cpp` | `DataSet`, `CharBlock`, `build_dataset`, simplification |
| Tree | `ts_tree.h/.cpp` | `TreeState`, topology manipulation, `PreallocUndo` |
| Constraint | `ts_constraint.h/.cpp` | Topological constraint enforcement |
| TBR | `ts_tbr.h/.cpp` | TBR search (with sector_mask for CSS) |
| SPR/NNI | `ts_search.h/.cpp` | SPR and NNI search (standalone, not in driven pipeline) |
| Ratchet | `ts_ratchet.h/.cpp` | Perturbation (zero/upweight/mixed, adaptive) |
| Drift | `ts_drift.h/.cpp` | Accept suboptimal moves within AFD/RFD limits |
| Wagner | `ts_wagner.h/.cpp` | Greedy addition tree (incremental scoring, NA-aware) |
| Sectorial | `ts_sector.h/.cpp` | RSS (conflict-guided), XSS, CSS; from-above HTU |
| Fuse | `ts_fuse.h/.cpp` | Tree fusing (in-place exchange) |
| Pool | `ts_pool.h/.cpp` | Dedup, eviction, consensus hash, split frequency table |
| Splits | `ts_splits.h/.cpp` | Bipartition computation, comparison, `hash_single_split()` |
| Driven | `ts_driven.h/.cpp` | Multi-replicate orchestrator |
| Resample | `ts_resample.h/.cpp` | Jackknife, bootstrap, successive approximations |
| Parallel | `ts_parallel.h/.cpp` | `std::thread` inter-replicate parallelism |
| RNG | `ts_rng.h/.cpp` | Thread-safe RNG (`thread_local` dispatch) |
| Simplify | `ts_simplify.h/.cpp` | Character compression and uninformativeness checks |
| Collapsed | `ts_collapsed.h/.cpp` | Zero-length edge detection for clip skipping |
| NNI perturb | `ts_nni_perturb.h/.cpp` | Stochastic NNI-perturbation (IQ-TREE-style topology escape) |
| HSJ scoring | `ts_hsj.h/.cpp` | Hopkins & St. John hierarchy scoring |
| Sankoff | `ts_sankoff.h/.cpp` | Sankoff step-matrix scoring (x-transform) |
| Rcpp bridge | `ts_rcpp.cpp` | All Rcpp-exported functions |

---

## Scoring modes

`ScoringMode` enum in `ts_data.h`: `EW`, `IW`, `PROFILE`, `XFORM`.
- **EW**: standard Fitch parsimony
- **IW**: implied weights via `e/(k+e)` where `e = steps - min_steps`
- **PROFILE**: lookup in `info_amounts` table (structurally identical to IW pipeline)
- **XFORM**: Fitch(non-hierarchy) + Sankoff(recoded composite characters)

Profile mode sets `ds.concavity = 1.0` (finite sentinel) so existing
`isfinite()` checks activate the weighted pipeline without code duplication.

---

## Parallelism design

- `std::thread` (not OpenMP) to avoid R memory allocator conflicts
- Per-thread: `DataSet` copy, `ConstraintData` copy, `std::mt19937` RNG
- Shared: `ThreadSafePool` (mutex-guarded), atomic stop flag
- Main thread: pre-generates seeds from R's RNG, polls
`R_CheckUserInterrupt()` and timeout every 200ms
- Worker threads make no R API calls — `ts_rng.h` provides `thread_local`
dispatch (null → R API for serial; set → thread-local for parallel)

---

## Scoring notes

- `.h` file changes (`ts_fitch_na.h`, `ts_fitch_na_incr.h`) may require
`touch src/ts_fitch.cpp` before rebuild if the build system doesn't track
header dependencies.
- Incremental scoring is a **screening heuristic** for candidate selection;
`full_rescore()` / `score_tree()` is always authoritative.
- See `.positai/expertise/fitch-scoring.md` for detailed invariants:
uppass correctness proof, NA staleness analysis, `upweight_mask` audit.

---

## Constraint enforcement

- `build_constraint()` reads R split matrix with **column-major** indexing:
`split_matrix[s + n_splits * t]`.
- Wagner uses LCA-based constraint mapping (`wagner_map_constraint_nodes`)
since splits aren't fully present during incremental construction.
- Wagner has a posthoc retry loop (up to 100 random addition orders) as a
safety net for edge cases.

---

## Exported Rcpp functions

All registered in `ts_rcpp.cpp` and `TreeSearch-init.c`. Run
`Rscript check_init.R` to verify consistency.

| Function | Module | Purpose |
|----------|--------|---------|
| `ts_fitch_score` | ts_fitch | Score a tree |
| `ts_char_steps` | ts_rcpp | Per-pattern step counts (with simplification offsets) |
| `ts_na_debug_char` | ts_fitch_na | Per-node debug for a single pattern |
| `ts_na_char_steps` | ts_fitch_na | Per-pattern step counts (raw, no offsets) |
| `ts_debug_clip` | ts_fitch | Debug SPR clip/regraft |
| `ts_test_indirect` | ts_fitch | Debug indirect length |
| `ts_nni_search` | ts_search | NNI hill-climbing |
| `ts_spr_search` | ts_search | SPR hill-climbing |
| `ts_tbr_search` | ts_tbr | TBR with plateau exploration |
| `ts_ratchet_search` | ts_ratchet | Ratchet perturbation |
| `ts_drift_search` | ts_drift | Drift search |
| `ts_wagner_tree` | ts_wagner | Wagner tree (specified addition order) |
| `ts_random_wagner_tree` | ts_wagner | Wagner tree (random order) |
| `ts_compute_splits` | ts_splits | Bipartition splits from edge matrix |
| `ts_trees_equal` | ts_splits | Compare two trees |
| `ts_pool_test` | ts_pool | Pool deduplication test |
| `ts_tree_fuse` | ts_fuse | Fuse two trees |
| `ts_sector_diag` | ts_sector | Sectorial search diagnostics |
| `ts_rss_search` | ts_sector | Random Sectorial Search |
| `ts_xss_search` | ts_sector | Exclusive Sectorial Search |
| `ts_driven_search` | ts_driven | Full driven search |
| `ts_resample_search` | ts_resample | One jackknife/bootstrap replicate |
| `ts_successive_approx` | ts_resample | Successive approximations |
| `ts_parallel_resample` | ts_parallel | Batch resample with parallelism |
| `ts_bench_tbr_phases` | ts_rcpp | TBR phase timing diagnostic |
| `ts_hsj_score` | ts_hsj | HSJ hierarchy scoring |

---

## Key design decisions

1. **PreallocUndo** (`ts_tree.h`): Pre-allocated flat buffers for TBR/drift
undo stack. Uses `grow()` to dynamically expand when capacity exceeded
(NA uppass saves both internal nodes and tips). Initial capacity `3 * n_node`.

2. **TBR symmetry breaking** (`ts_tbr.cpp`): FNV-1a hash deduplication of
`virtual_prelim` vectors to skip redundant rerooting evaluations.

3. **Bounded indirect scoring**: All search modules use `_bounded` variants
that bail out when accumulated score exceeds best candidate.

4. **Profile parsimony**: Reuses IW indirect pipeline unchanged; only delta
precomputation differs. `ds.concavity = 1.0` sentinel activates weighted
path. Max 2 informative states per character; inapplicable → ambiguous.

5. **MPT enumeration**: Post-search TBR plateau walk from all pool seeds.
`tbr_search()` accepts optional `TreePool* collect_pool` parameter.

6. **All-ambiguous phyDat guard**: `TreeLength()` and `MaximizeParsimony()`
check for `levels = NULL` / 0-column contrast matrix before calling C++.

7. **From-above HTU for sectorial search** (`ts_sector.cpp`):
`compute_from_above_for_sector()` computes `from_above[sector_root]` —
the Fitch state-set the rest of the tree sends *down* to the sector
boundary, excluding the sector's own contribution. Used instead of
`final_[parent]` in `build_reduced_dataset()`. O(depth × total_words).

8. **Split frequency table** (`ts_pool.h/.cpp`): `SplitFrequencyTable` maps
per-split FNV-1a hash → occurrence count across best-score pool trees.
Used by conflict-guided RSS to weight sector selection. The same FNV-1a
hash (`hash_single_split()` in `ts_splits.h`) is used by consensus
hashing and split frequency counting — must stay consistent.

9. **Consensus-stability hash** (`ts_pool.cpp`): XOR of FNV-1a hashes of
splits present in ALL best-score trees. Updated after each replicate.
Hash collision false-matches are conservative (over-count stability).

10. **Diversity-aware pool eviction** (`ts_pool.cpp`): When the pool is full
and a new tree ties the worst score, the entry most similar to the new
tree (most shared splits, counted via per-split FNV-1a hash set
membership) is evicted. Falls back to arbitrary worst entry when the
new tree is strictly better.

11. **Cross-replicate consensus constraint tightening** (`ts_driven.cpp`):
When `consensus_constrain = true` and no user constraint is supplied,
after ≥5 replicates, unanimous pool splits are extracted and enforced
as topological constraints via `build_constraint_from_bitsets()`. The
TBR/SPR search then avoids breaking established consensus clades.
Constraints are cleared and rebuilt whenever the best score changes.
Sector/fuse operations do not enforce auto-constraints.
174 changes: 174 additions & 0 deletions .AGENTS/memory/benchmarking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# Benchmarks and Profiling

Load this when: running benchmarks, interpreting benchmark results,
doing VTune profiling, or selecting datasets for strategy validation.

See also: `search-algorithms.md` (NNI, biased Wagner, outer cycles results),
`search_strategy.md` (presets, ratchet tuning).

---

## VTune driver scripts — dry-run first

**Always test a VTune driver script with plain `Rscript` before launching
VTune.** Software-sampling overhead can be 5–20×; if the bare script takes
30s, VTune may need 10 min. Target < 5s bare run for a lite driver.

MaddisonSlatkin is exponential in tip count — even n=20 with k=3 can take
seconds per call. Use small n (≤15 for k=3, ≤12 for k=4, ≤9 for k=5)
and few iterations for VTune drivers.

---

## MorphoBank external benchmark corpus

The neotrans repo (`../neotrans/inst/matrices/`) contains ~800 MorphoBank
NEXUS matrices. Complement to the 14 bundled datasets and 1 large-tree dataset.

**Catalogue:** `dev/benchmarks/mbank_catalogue.csv` (659 usable matrices
after ntax≥20 filter and dedup). Regenerate with
`Rscript dev/benchmarks/build_mbank_catalogue.R`.

**Train/validation split:** Matrices whose MorphoBank project number is
divisible by 5 are **validation** (124 matrices, ~19%). All others are
**training** (535 matrices). The 7 `syab*` files are always training.

**Dedup:** Multi-file projects with ≥95% character identity on shared taxa
(≥80% taxon overlap) are flagged `dedup_drop = TRUE`. 24 near-duplicates excluded.

**IMPORTANT:** Validation results must **never** be used to guide strategy
tuning. They confirm generalization only. This is a one-way door.

**Fixed 25-matrix training sample:** `MBANK_FIXED_SAMPLE` in
`bench_datasets.R` — 7 small, 7 medium, 7 large, 4 xlarge. Selected via
max-min distance on standardized features. **Do not modify.** Used by
`benchmark_mbank_sample()`. Fitch track only.

**Fixed 20-matrix Brazeau-track sample:** `MBANK_BRAZEAU_SAMPLE` in
`bench_datasets.R` — 5 small, 6 medium, 6 large, 3 xlarge. Restricted to
training matrices with **pct_inapp ≥ 4%**. **Do not modify.**

**Key functions** (in `dev/benchmarks/bench_datasets.R`):
- `load_mbank_catalogue()` — loads metadata CSV (excludes dedup by default)
- `load_mbank_sample(cat, n, seed, split)` — stratified random sample
- `load_mbank_datasets(cat, keys)` — load specific matrices by key
- `load_mbank_brazeau_sample(cat)` — fixed 20-matrix Brazeau sample
- `has_meaningful_inapp(cat, threshold)` — filter to pct_inapp ≥ threshold

**Benchmark runners** (in `dev/benchmarks/bench_framework.R`):
- `benchmark_mbank_sample()` — fixed 25-matrix training sample (routine)
- `benchmark_mbank_sweep(split)` — full training or validation sweep
- `benchmark_mbank_validation()` — validation sweep with prominent warning

**Benchmark tracks:**

| Track | Scoring | Datasets | Purpose |
|-------|---------|----------|---------|
| **Fitch** | `fitch_mode()` | 14 bundled + `MBANK_FIXED_SAMPLE` | TNT comparison, core search quality |
| **Brazeau** | Default (Brazeau 2019) | `MBANK_BRAZEAU_SAMPLE` + bundled | NA-algorithm-specific strategy tuning |

TNT comparisons are Fitch track only.

**TNT comparison suite** lives in `../TS-TNT-bench/`. Key files:
- `dev/benchmarks/bench_tnt_compare.R` — runner (smoke/medium/full)
- `dev/benchmarks/tnt_comparison.qmd` — Quarto report
- Requires TNT 1.6 at `C:/Programs/Phylogeny/tnt/TNT-bin/tnt.exe`

Benchmark scripts in `dev/benchmarks/`. Key files:
- `bench_regression.R` — CI regression test (score quality + timing bounds)
- `bench_framework.R` — Dataset × strategy × replicate grid
- `strategies.md` — Strategy space documentation

---

## Benchmarking methodology notes

**Metric:** When comparing strategies with different time costs (e.g.
NNI→TBR vs TBR), use **time-adjusted expected best** (TAEB) — the expected
minimum score from k = budget / time_per_rep independent replicates. Median
per-replicate score is adequate only when comparing parameter changes on a
fixed pipeline (same time-per-rep). Bootstrap estimation: sample k scores
with replacement, take the min, repeat 5000×, take the mean.

**Brazeau vs EW scoring confound (T-265, 2026-03-26):** TreeSearch uses the
Brazeau et al. (2019) inapplicable algorithm by default, which penalizes
inapplicable-to-applicable transitions. TNT treats `-` as `?` (standard EW
Fitch). On 11 gap datasets, the apparent mean gap was +17.8 steps; the
actual EW-vs-EW gap is only +2.2 steps (5 datasets at 0 gap). **All TNT
comparisons MUST use `fitch_mode()` to convert inapplicable to missing**
for apples-to-apples scoring. `fitch_mode()` is defined in
`bench_intra_fuse.R` and `bench_t265_regression.R`.

**`maxTime` confound (2026-03-23):** `maxTime` (legacy Morphy parameter)
silently delegates to the R-loop `Morphy()` engine. Use `maxSeconds` for
the C++ driven search, which is ~10× faster at 180 tips.

**Early vs late search:** Early replicates are dominated by initial descent
quality (Wagner → local optimum); late replicates test ratchet/drift escape.
At ≤88 tips, 20s gives 10–40 replicates spanning both regimes. At 180 tips,
20s doesn't complete one replicate.

---

## Phase distribution baselines

**T-290b (2026-03-28, Brazeau-sample datasets, 30s, post-T-255 no-drift presets):**

| Phase | Fitch/EW/default | Fitch/EW/thorough | Brazeau/EW/default | Brazeau/EW/thorough |
|-------|:---:|:---:|:---:|:---:|
| Ratchet | 76% | 65% | 74% | 63% |
| TBR | 8% | 5% | 7% | 4% |
| XSS | 6% | 7% | 5% | 6% |
| RSS | 3% | 10% | 3% | 10% |
| CSS | — | 7% | — | 7% |
| Wagner | 4% | 3% | 9% | 7% |
| Final TBR | 2% | 2% | 2% | 2% |

*(Drift has been 0% in all presets since T-255.)*

**Brazeau / Fitch per-phase cost ratios (T-290b, EW):**

| Phase | default | thorough |
|-------|:-------:|:--------:|
| Wagner | **3.6×** | **3.9×** |
| Ratchet | 1.3× | 1.3× |
| RSS/CSS | 1.3× | 1.3× |
| TBR | 0.9× | 0.9× |

Wagner is the outlier. All other phases are within 0.9–1.4× of Fitch cost.

**wagnerStarts under Brazeau (T-290b/c, 2026-03-28):**
- *Multiple reps/budget*: wagnerStarts=1 and 3 equivalent; w3 marginally better.
- *~1 rep/budget* (60s at 86t/3660c): wagnerStarts=3 better by +564 steps.
- *0 reps/budget* (30s at 86t/3660c): wagnerStarts=1 **better** — Brazeau
Wagner is expensive (~4×), 3 starts consume budget.
Current presets correct: thorough (w3, gets ≥1 rep at 65–119t) ✓; large (w1) ✓.

Per-candidate indirect scoring is at memory-throughput limit (~23 ns at 75 tips).

---

## Ratchet tuning validation (2026-03-22)

Full 14-dataset comparison, optimized vs original defaults (10s budget, 3 seeds).

| Dataset | Tips | Original | Optimized | Delta |
|---------|:---:|:---:|:---:|:---:|
| Longrich2010 | 20 | 131 | 131 | 0 |
| Vinther2008 | 23 | 79 | 79 | 0 |
| Sansom2010 | 23 | 189 | 189 | 0 |
| DeAssis2011 | 33 | 64 | 64 | 0 |
| Aria2015 | 35 | 143 | 143 | 0 |
| Wortley2006 | 37 | 494 | 491 | +3 |
| Griswold1999 | 43 | 408 | 407 | +1 |
| Schulze2007 | 52 | 165 | 164 | +1 |
| Eklund2004 | 54 | 442 | 441 | +1 |
| Agnarsson2004 | 62 | 778 | 778 | 0 |
| Zanol2014 | 74 | 1338 | 1331 | +7 |
| Zhu2013 | 75 | 649 | 650 | −1 |
| Giles2015 | 78 | 720 | 716 | +4 |
| Dikow2009 | 88 | 1614 | 1614 | 0 |

Zhu2013 marginal regression at 10s resolves at 20s (median 649→644).
At 20s with 5 seeds: Zhu2013 645/643, Giles2015 712/710, Dikow2009
1611/1611 (all improvements).
Loading