T-306: gate HSJ/XFORM SPR/NNI accept-paths to full_rescore by ms609 · Pull Request #249 · ms609/TreeSearch

ms609 · 2026-06-15T06:31:19Z

T-306 (P3) — HSJ/XFORM SPR/NNI accept-paths omit hierarchy DP contribution from `best_score`

Problem

The SPR accept path in tbr_search() (src/ts_tbr.cpp) and the accept path in nni_search() (src/ts_search.cpp) updated best_score with a Fitch-only (EW) or IW/profile incremental delta. For HSJ and XFORM scoring, score_tree() additionally adds a topology-dependent hierarchy-DP (HSJ) / Sankoff (XFORM) term that the delta omits, so best_score drifted from the authoritative score and the search's internal accept/reject decisions tracked the wrong objective.

Final user-visible scores stayed correct (run_single_replicate always recomputes via score_tree() before pooling), so this was a silent search-quality regression, not a wrong-output bug.

Fix

Gate the incremental fast path on ds.scoring_mode ∈ {EW, IW, XPIWE, PROFILE}; HSJ and XFORM fall back to full_rescore() in both accept paths — the same scoring-mode classification used by the T-275/T-303 guards. The NNI rejection-restore path likewise rebuilds the postorder for HSJ/XFORM rather than running a stale incremental downpass. hsj_score()/score_tree() call fitch_score_ew() internally, so the fallback leaves the Fitch state arrays in the same state as the EW full_rescore branch already exercised for TBR rerooting moves — safe for continued iteration.

The standalone spr_search() already full_rescores on accept, so it was unaffected; scope matches the task (only the two named paths needed changes).

Tests

New Tier-2 test tests/testthat/test-ts-t306-accept-guard.R drives the full HSJ and XFORM driven search on a brute-forceable 7-tip dataset (945 unrooted trees) and asserts:

the search reaches the true global optimum (brute-forced via TreeLength);
every returned tree's independent score equals the reported score (no best_score desync);
identical seeds are deterministic.

The HSJ optimum on the test dataset is a strict subset of the Fitch optimum (3 ⊊ 15 of 945), so reaching it exercises the hierarchy-aware scoring, not just the Fitch component.

Note on discrimination: the bug is silent at the reported-score level — an empirical sweep confirmed that a buggy build still reaches the optimum from every Fitch-optimal "trap" start tree, because the driven pipeline's TBR rerooting moves (always full_rescore) recover the optimum regardless. A black-box MaximizeParsimony() test therefore cannot discriminate buggy from fixed (matching the T-303 sibling finding). The test instead locks in that the new full_rescore fallback yields a correct, consistent, deterministic search.

Validation

Local targeted tests pass: new T-306 test (42 assertions) + existing test-ts-hsj (37), test-ts-xform (68), test-ts-tbr-search (28), test-ts-iw (86), test-ts-spr-nni-opt (45), test-ts-nni-iw-rescore (20) — no regressions.
GHA agent-check dispatched: run 27528341506.
Vignette search-algorithm.Rmd updated to document the HSJ/XFORM full-rescore fallback (per the AGENTS.md scoring-change rule).

Sibling task T-303 (sector path, same bug family) is handled separately in a different file set.

🤖 Generated with Claude Code

The SPR accept path in tbr_search() (src/ts_tbr.cpp) and the accept path in nni_search() (src/ts_search.cpp) updated best_score with a Fitch-only (EW) or IW/profile incremental delta. For HSJ and XFORM scoring, score_tree() additionally adds a topology-dependent hierarchy-DP (HSJ) or Sankoff (XFORM) term that the delta omits, so the search's internal accept/reject decisions tracked the wrong objective (best_score drifted from the authoritative score). Final user-visible scores stayed correct because run_single_replicate always recomputes via score_tree() before pooling, so this was a silent search-quality regression. Fix: gate the incremental fast path on ds.scoring_mode being one of EW/IW/XPIWE/PROFILE; HSJ and XFORM fall back to full_rescore() in both accept paths (same scoring-mode classification as the T-275/T-303 guards). The NNI rejection-restore path likewise rebuilds the postorder for HSJ/XFORM instead of a stale incremental downpass. Adds a Tier-2 regression test (test-ts-t306-accept-guard.R) driving the full HSJ and XFORM driven search on a brute-forceable 7-tip dataset and asserting the search reaches the true global optimum, that every returned tree's independent score equals the reported score (no best_score desync), and that identical seeds are deterministic. Documents the HSJ/XFORM full-rescore fallback in vignettes/search-algorithm.Rmd. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ms609 · 2026-06-15T06:43:40Z

⚠️ GHA agent-check (run 27528341506) is red, but the failures are pre-existing on cpp-search and unrelated to this PR.

R CMD check reported 2 ERRORs, 2 WARNINGs, 1 NOTE, all in the WideSample/MaxMin feature area:

examples ERROR: Error in MaxMin::FarFirst(colFn, m = n, N = nTrees, progress = FALSE) (from the MaxMin/highs solver work, e.g. b527a0f3).
dependency WARNINGs: 'loadNamespace'/'library' call not declared from: 'highs'.
spelling ERROR: DropAdd, FarFirst, Fathi, MMDP, MaxMin, Porumbel, Resende, Sayyady, WideSample, medoid, pkg — all WideSample/MaxMin terms missing from inst/WORDLIST.

This PR touches only src/ts_tbr.cpp, src/ts_search.cpp, tests/testthat/test-ts-t306-accept-guard.R, and vignettes/search-algorithm.Rmd — none of the failing areas, and it introduces no new spelling errors. cpp-search HEAD has not had a green agent-check since the MaxMin/WideSample work merged (last success was 2026-03-29 at an old SHA), so this branch simply inherits that breakage.

Local targeted tests on the built branch pass: the new test-ts-t306-accept-guard.R (42 assertions) plus test-ts-hsj (37), test-ts-xform (68), test-ts-tbr-search (28), test-ts-iw (86), test-ts-spr-nni-opt (45), and test-ts-nni-iw-rescore (20) — no regressions.

Recommend merging the MaxMin/WideSample CI fix (example dependency + WORDLIST) on cpp-search first, after which a re-run of agent-check on this branch should be green.

Avoid a direct phangorn::phyDat call in the regression test helper; use TreeTools::MatrixToPhyDat (already loaded) instead. Test still passes (42 assertions): the dataset retains a sharp HSJ optimum that is a strict subset of the Fitch optimum, and the search reaches the true optimum. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@export

… T-308 secondary) (#250) * T-307: Fix HSJ primary absent_state (was "-" index, not "0") The driven HSJ pipeline hard-coded absent_state = 0L, which for the usual levels c("-","0","1") is the index of the inapplicable token "-", not the primary's "absent" state "0" (index 1). Primaries coded "0" were therefore treated as PRESENT, so gain/loss of the structure was never counted; the HSJ score was insensitive to primary presence/absence (verified: alpha=0 single-absent-tip scored 0 instead of 1). - Add hsj_absent_state(dataset): 0-based index of the "0" token in `levels` (falls back to "-"), so the value tracks the dataset's level ordering instead of being hard-coded. build_tip_labels is level-order generic, so a constant was fragile regardless of which constant. - Use it at every driven call site: MaximizeParsimony, Morphy (resample), TreeLength.phylo and TreeLength.list. Reconcile the test helper to compute it the same way (was hard-coded 1L). - C++ score_hierarchy_block: treat the primary as absent when its label is the absent_state OR the inapplicable token, mirroring recode_hierarchy's `pri == "0" || pri == "-"` and fixing nested hierarchies where a primary may itself be inapplicable. - Regression tests: alpha=0 primary sensitivity, driven-vs-direct agreement, hsj_absent_state across level orderings, primary-scoring level invariance. Separate from T-306 (PR #249); discovered during that work. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Add WideSample false positives to WORDLIST agent-check was failing on cpp-search before this branch: the spelling test errored on DropAdd, FarFirst, Fathi, MMDP, MaxMin, Porumbel, Resende, Sayyady, WideSample, medoid, pkg — author surnames, algorithm acronyms and identifiers from recent WideSample work that were never added to the wordlist. Added via spelling::update_wordlist(); restored config/maximin/ warmup that the tool would otherwise have pruned. Unrelated to the HSJ fix but required to get this branch's CI green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Declare 'highs' in Suggests R/WideSample.R conditionally uses highs via requireNamespace() and in \dontrun/guarded examples, but it was never declared, so R CMD check --as-cran (windows job, error-on="warning") failed with "'library' or 'require' call not declared from: 'highs'". Pre-existing on cpp-search from recent WideSample work; required to get this branch's CI green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * T-308: Make HSJ secondary dissimilarity invariant to level ordering fitch_label_char()'s uppass resolved ambiguous internal nodes to the LOWEST SET BIT. Which token that bit denotes depends on the phyDat `levels` ordering, so differently-ordered levels produced different secondary reconstructions, different per-branch mismatch counts d, and therefore different scores (the repro dataset scored 2.5 vs 2.0 at hsj_alpha=1). A parsimony-style score must not depend on the arbitrary internal ordering of levels. Resolve instead toward the best-supported token in the node's own subtree (max tip count, ties broken by smallest supporting tip index). Both keys are properties of the tokens and the tree, not of the bit encoding, giving a strict, deterministic, order-independent resolution that is still a valid MPR — so the dissimilarity stays non-zero (the concern that motivated adding the uppass in T-134). Only the primary term was order-safe before (T-307); this closes the secondary term. Verified invariant across all level permutations on the repro plus a 1200-case randomised battery (multistate secondaries, "?" missing data, random trees, alpha in {0,0.5,1}). Existing HSJ value assertions are unchanged. Regression test extended to hsj_alpha in {0.5, 1} across all orderings, plus a multistate-secondary case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * T-308: make HSJ absent-state helper private (.HSJAbsentState) `hsj_absent_state` (added with T-307) breached the package naming conventions: R functions are BigCamelCase and private helpers are dot-prefixed (r-conventions). It was also @export-ed with a public man page despite being internal C++-bridge plumbing that returns a 0-based token index. Rename to `.HSJAbsentState`, drop @export and the man page, and document it with a plain comment like the package's other private helpers (.ParseOneBlock, .HierarchicalResampleWeights). Call sites in MaximizeParsimony.R, Morphy.R, tree_length.R and the test bridge updated. No behaviour change; the function is still reachable via ::: for tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Rename hierarchy/HSJ module to package naming conventions The whole hierarchical-scoring module was written in snake_case, breaching the package convention (functions BigCamelCase; private functions dot- prefixed; variables camelCase). Following the .HSJAbsentState fix, sweep the rest of the module: Exported helpers -> BigCamelCase (snake_case names removed): recode_hierarchy -> RecodeHierarchy (public) hierarchy_from_names -> HierarchyFromNames (public) validate_hierarchy -> ValidateHierarchy (exported-internal) hierarchy_chars -> HierarchyChars (exported-internal) hierarchy_controlling -> HierarchyControlling (exported-internal) Pure C++-bridge plumbing -> private, unexported (dot-prefixed), documented with plain comments like the package's other private helpers: build_tip_labels -> .BuildTipLabels hierarchy_to_blocks -> .HierarchyToBlocks non_hierarchy_weights -> .NonHierarchyWeights (.HSJAbsentState already done) Also: nested helpers (.recode_block -> .RecodeBlock, .flatten_block -> .FlattenBlock); the public `char_names` arg -> `charNames`; and local variables in the dedicated module files (CharacterHierarchy.R, recode_hierarchy.R) camelCased. The internal resample-weights list fields were renamed to camelCase too (`non_hierarchy_weights` -> `nonHierarchyWeights`, `block_counts` -> `blockCounts`), updated in producer, consumer and tests. Call sites updated across R/, tests/, the vignette and the Shiny app; man pages and NAMESPACE regenerated via roxygen2. NEWS records the breaking rename. All hierarchy/HSJ/xform/recode/resample/tree_length tests pass; spelling clean. Local variables in the large shared scoring functions (tree_length.R, Morphy.R, MaximizeParsimony.R) keep their existing `tip_data`/`adj_weight` style — those bridge locals are pervasive across non-hierarchy branches too, so renaming them belongs to a separate cleanup. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

ms609 added a commit that referenced this pull request Jun 15, 2026

coord: T-306 -> PR #249 (t306)

07527b2

ms609 and others added 3 commits June 15, 2026 07:47

Merge branch 'cpp-search' into feature/t306-hsj-xform-accept-guard

d640411

Merge branch 'cpp-search' into feature/t306-hsj-xform-accept-guard

95beaaa

ms609 merged commit a15d740 into cpp-search Jun 15, 2026
3 of 7 checks passed

ms609 deleted the feature/t306-hsj-xform-accept-guard branch June 15, 2026 11:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T-306: gate HSJ/XFORM SPR/NNI accept-paths to full_rescore#249

T-306: gate HSJ/XFORM SPR/NNI accept-paths to full_rescore#249
ms609 merged 4 commits into
cpp-searchfrom
feature/t306-hsj-xform-accept-guard

ms609 commented Jun 15, 2026

Uh oh!

ms609 commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ms609 commented Jun 15, 2026

T-306 (P3) — HSJ/XFORM SPR/NNI accept-paths omit hierarchy DP contribution from best_score

Problem

Fix

Tests

Validation

Uh oh!

ms609 commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

T-306 (P3) — HSJ/XFORM SPR/NNI accept-paths omit hierarchy DP contribution from `best_score`