Skip to content

T-306: gate HSJ/XFORM SPR/NNI accept-paths to full_rescore#249

Merged
ms609 merged 4 commits into
cpp-searchfrom
feature/t306-hsj-xform-accept-guard
Jun 15, 2026
Merged

T-306: gate HSJ/XFORM SPR/NNI accept-paths to full_rescore#249
ms609 merged 4 commits into
cpp-searchfrom
feature/t306-hsj-xform-accept-guard

Conversation

@ms609

@ms609 ms609 commented Jun 15, 2026

Copy link
Copy Markdown
Owner

T-306 (P3) — HSJ/XFORM SPR/NNI accept-paths omit hierarchy DP contribution from best_score

Problem

The SPR accept path in tbr_search() (src/ts_tbr.cpp) and the accept path in nni_search() (src/ts_search.cpp) updated best_score with a Fitch-only (EW) or IW/profile incremental delta. For HSJ and XFORM scoring, score_tree() additionally adds a topology-dependent hierarchy-DP (HSJ) / Sankoff (XFORM) term that the delta omits, so best_score drifted from the authoritative score and the search's internal accept/reject decisions tracked the wrong objective.

Final user-visible scores stayed correct (run_single_replicate always recomputes via score_tree() before pooling), so this was a silent search-quality regression, not a wrong-output bug.

Fix

Gate the incremental fast path on ds.scoring_mode ∈ {EW, IW, XPIWE, PROFILE}; HSJ and XFORM fall back to full_rescore() in both accept paths — the same scoring-mode classification used by the T-275/T-303 guards. The NNI rejection-restore path likewise rebuilds the postorder for HSJ/XFORM rather than running a stale incremental downpass. hsj_score()/score_tree() call fitch_score_ew() internally, so the fallback leaves the Fitch state arrays in the same state as the EW full_rescore branch already exercised for TBR rerooting moves — safe for continued iteration.

The standalone spr_search() already full_rescores on accept, so it was unaffected; scope matches the task (only the two named paths needed changes).

Tests

New Tier-2 test tests/testthat/test-ts-t306-accept-guard.R drives the full HSJ and XFORM driven search on a brute-forceable 7-tip dataset (945 unrooted trees) and asserts:

  • the search reaches the true global optimum (brute-forced via TreeLength);
  • every returned tree's independent score equals the reported score (no best_score desync);
  • identical seeds are deterministic.

The HSJ optimum on the test dataset is a strict subset of the Fitch optimum (3 ⊊ 15 of 945), so reaching it exercises the hierarchy-aware scoring, not just the Fitch component.

Note on discrimination: the bug is silent at the reported-score level — an empirical sweep confirmed that a buggy build still reaches the optimum from every Fitch-optimal "trap" start tree, because the driven pipeline's TBR rerooting moves (always full_rescore) recover the optimum regardless. A black-box MaximizeParsimony() test therefore cannot discriminate buggy from fixed (matching the T-303 sibling finding). The test instead locks in that the new full_rescore fallback yields a correct, consistent, deterministic search.

Validation

  • Local targeted tests pass: new T-306 test (42 assertions) + existing test-ts-hsj (37), test-ts-xform (68), test-ts-tbr-search (28), test-ts-iw (86), test-ts-spr-nni-opt (45), test-ts-nni-iw-rescore (20) — no regressions.
  • GHA agent-check dispatched: run 27528341506.
  • Vignette search-algorithm.Rmd updated to document the HSJ/XFORM full-rescore fallback (per the AGENTS.md scoring-change rule).

Sibling task T-303 (sector path, same bug family) is handled separately in a different file set.

🤖 Generated with Claude Code

The SPR accept path in tbr_search() (src/ts_tbr.cpp) and the accept path
in nni_search() (src/ts_search.cpp) updated best_score with a Fitch-only
(EW) or IW/profile incremental delta. For HSJ and XFORM scoring,
score_tree() additionally adds a topology-dependent hierarchy-DP (HSJ) or
Sankoff (XFORM) term that the delta omits, so the search's internal
accept/reject decisions tracked the wrong objective (best_score drifted
from the authoritative score). Final user-visible scores stayed correct
because run_single_replicate always recomputes via score_tree() before
pooling, so this was a silent search-quality regression.

Fix: gate the incremental fast path on ds.scoring_mode being one of
EW/IW/XPIWE/PROFILE; HSJ and XFORM fall back to full_rescore() in both
accept paths (same scoring-mode classification as the T-275/T-303
guards). The NNI rejection-restore path likewise rebuilds the postorder
for HSJ/XFORM instead of a stale incremental downpass.

Adds a Tier-2 regression test (test-ts-t306-accept-guard.R) driving the
full HSJ and XFORM driven search on a brute-forceable 7-tip dataset and
asserting the search reaches the true global optimum, that every returned
tree's independent score equals the reported score (no best_score
desync), and that identical seeds are deterministic. Documents the
HSJ/XFORM full-rescore fallback in vignettes/search-algorithm.Rmd.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ms609 added a commit that referenced this pull request Jun 15, 2026
@ms609

ms609 commented Jun 15, 2026

Copy link
Copy Markdown
Owner Author

⚠️ GHA agent-check (run 27528341506) is red, but the failures are pre-existing on cpp-search and unrelated to this PR.

R CMD check reported 2 ERRORs, 2 WARNINGs, 1 NOTE, all in the WideSample/MaxMin feature area:

  • examples ERROR: Error in MaxMin::FarFirst(colFn, m = n, N = nTrees, progress = FALSE) (from the MaxMin/highs solver work, e.g. b527a0f3).
  • dependency WARNINGs: 'loadNamespace'/'library' call not declared from: 'highs'.
  • spelling ERROR: DropAdd, FarFirst, Fathi, MMDP, MaxMin, Porumbel, Resende, Sayyady, WideSample, medoid, pkg — all WideSample/MaxMin terms missing from inst/WORDLIST.

This PR touches only src/ts_tbr.cpp, src/ts_search.cpp, tests/testthat/test-ts-t306-accept-guard.R, and vignettes/search-algorithm.Rmd — none of the failing areas, and it introduces no new spelling errors. cpp-search HEAD has not had a green agent-check since the MaxMin/WideSample work merged (last success was 2026-03-29 at an old SHA), so this branch simply inherits that breakage.

Local targeted tests on the built branch pass: the new test-ts-t306-accept-guard.R (42 assertions) plus test-ts-hsj (37), test-ts-xform (68), test-ts-tbr-search (28), test-ts-iw (86), test-ts-spr-nni-opt (45), and test-ts-nni-iw-rescore (20) — no regressions.

Recommend merging the MaxMin/WideSample CI fix (example dependency + WORDLIST) on cpp-search first, after which a re-run of agent-check on this branch should be green.

ms609 and others added 3 commits June 15, 2026 07:47
Avoid a direct phangorn::phyDat call in the regression test helper; use
TreeTools::MatrixToPhyDat (already loaded) instead. Test still passes
(42 assertions): the dataset retains a sharp HSJ optimum that is a strict
subset of the Fitch optimum, and the search reaches the true optimum.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ms609 added a commit that referenced this pull request Jun 15, 2026
… T-308 secondary) (#250)

* T-307: Fix HSJ primary absent_state (was "-" index, not "0")

The driven HSJ pipeline hard-coded absent_state = 0L, which for the usual
levels c("-","0","1") is the index of the inapplicable token "-", not the
primary's "absent" state "0" (index 1). Primaries coded "0" were therefore
treated as PRESENT, so gain/loss of the structure was never counted; the
HSJ score was insensitive to primary presence/absence (verified: alpha=0
single-absent-tip scored 0 instead of 1).

- Add hsj_absent_state(dataset): 0-based index of the "0" token in `levels`
  (falls back to "-"), so the value tracks the dataset's level ordering
  instead of being hard-coded. build_tip_labels is level-order generic, so a
  constant was fragile regardless of which constant.
- Use it at every driven call site: MaximizeParsimony, Morphy (resample),
  TreeLength.phylo and TreeLength.list. Reconcile the test helper to compute
  it the same way (was hard-coded 1L).
- C++ score_hierarchy_block: treat the primary as absent when its label is
  the absent_state OR the inapplicable token, mirroring recode_hierarchy's
  `pri == "0" || pri == "-"` and fixing nested hierarchies where a primary
  may itself be inapplicable.
- Regression tests: alpha=0 primary sensitivity, driven-vs-direct agreement,
  hsj_absent_state across level orderings, primary-scoring level invariance.

Separate from T-306 (PR #249); discovered during that work.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Add WideSample false positives to WORDLIST

agent-check was failing on cpp-search before this branch: the spelling
test errored on DropAdd, FarFirst, Fathi, MMDP, MaxMin, Porumbel, Resende,
Sayyady, WideSample, medoid, pkg — author surnames, algorithm acronyms and
identifiers from recent WideSample work that were never added to the
wordlist. Added via spelling::update_wordlist(); restored config/maximin/
warmup that the tool would otherwise have pruned. Unrelated to the HSJ fix
but required to get this branch's CI green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Declare 'highs' in Suggests

R/WideSample.R conditionally uses highs via requireNamespace() and in
\dontrun/guarded examples, but it was never declared, so R CMD check
--as-cran (windows job, error-on="warning") failed with "'library' or
'require' call not declared from: 'highs'". Pre-existing on cpp-search
from recent WideSample work; required to get this branch's CI green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* T-308: Make HSJ secondary dissimilarity invariant to level ordering

fitch_label_char()'s uppass resolved ambiguous internal nodes to the
LOWEST SET BIT. Which token that bit denotes depends on the phyDat
`levels` ordering, so differently-ordered levels produced different
secondary reconstructions, different per-branch mismatch counts d, and
therefore different scores (the repro dataset scored 2.5 vs 2.0 at
hsj_alpha=1). A parsimony-style score must not depend on the arbitrary
internal ordering of levels.

Resolve instead toward the best-supported token in the node's own
subtree (max tip count, ties broken by smallest supporting tip index).
Both keys are properties of the tokens and the tree, not of the bit
encoding, giving a strict, deterministic, order-independent resolution
that is still a valid MPR — so the dissimilarity stays non-zero (the
concern that motivated adding the uppass in T-134).

Only the primary term was order-safe before (T-307); this closes the
secondary term. Verified invariant across all level permutations on the
repro plus a 1200-case randomised battery (multistate secondaries, "?"
missing data, random trees, alpha in {0,0.5,1}). Existing HSJ value
assertions are unchanged. Regression test extended to hsj_alpha in
{0.5, 1} across all orderings, plus a multistate-secondary case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* T-308: make HSJ absent-state helper private (.HSJAbsentState)

`hsj_absent_state` (added with T-307) breached the package naming
conventions: R functions are BigCamelCase and private helpers are
dot-prefixed (r-conventions). It was also @export-ed with a public man
page despite being internal C++-bridge plumbing that returns a 0-based
token index.

Rename to `.HSJAbsentState`, drop @export and the man page, and document
it with a plain comment like the package's other private helpers
(.ParseOneBlock, .HierarchicalResampleWeights). Call sites in
MaximizeParsimony.R, Morphy.R, tree_length.R and the test bridge updated.

No behaviour change; the function is still reachable via ::: for tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Rename hierarchy/HSJ module to package naming conventions

The whole hierarchical-scoring module was written in snake_case, breaching
the package convention (functions BigCamelCase; private functions dot-
prefixed; variables camelCase).  Following the .HSJAbsentState fix, sweep
the rest of the module:

Exported helpers -> BigCamelCase (snake_case names removed):
  recode_hierarchy   -> RecodeHierarchy        (public)
  hierarchy_from_names -> HierarchyFromNames    (public)
  validate_hierarchy -> ValidateHierarchy       (exported-internal)
  hierarchy_chars    -> HierarchyChars          (exported-internal)
  hierarchy_controlling -> HierarchyControlling (exported-internal)

Pure C++-bridge plumbing -> private, unexported (dot-prefixed), documented
with plain comments like the package's other private helpers:
  build_tip_labels   -> .BuildTipLabels
  hierarchy_to_blocks -> .HierarchyToBlocks
  non_hierarchy_weights -> .NonHierarchyWeights
  (.HSJAbsentState already done)

Also: nested helpers (.recode_block -> .RecodeBlock, .flatten_block ->
.FlattenBlock); the public `char_names` arg -> `charNames`; and local
variables in the dedicated module files (CharacterHierarchy.R,
recode_hierarchy.R) camelCased.  The internal resample-weights list fields
were renamed to camelCase too (`non_hierarchy_weights` -> `nonHierarchyWeights`,
`block_counts` -> `blockCounts`), updated in producer, consumer and tests.

Call sites updated across R/, tests/, the vignette and the Shiny app; man
pages and NAMESPACE regenerated via roxygen2.  NEWS records the breaking
rename.  All hierarchy/HSJ/xform/recode/resample/tree_length tests pass;
spelling clean.

Local variables in the large shared scoring functions (tree_length.R,
Morphy.R, MaximizeParsimony.R) keep their existing `tip_data`/`adj_weight`
style — those bridge locals are pervasive across non-hierarchy branches too,
so renaming them belongs to a separate cleanup.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@ms609 ms609 merged commit a15d740 into cpp-search Jun 15, 2026
3 of 7 checks passed
@ms609 ms609 deleted the feature/t306-hsj-xform-accept-guard branch June 15, 2026 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant