Faster majority consensus & RenumberTips; Jansson evaluated (not implemented)#274
Merged
Conversation
consensus_tree()'s third argument is 'exact', not 'hash'; the script called consensus_tree(forest, p, hash = FALSE), which errored with an unused-argument error and so had not run since the C++ export was renamed. Use exact = TRUE; the script is now green (0 failures). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Majority/threshold Consensus() and SplitFrequency() built every distinct split's packed bit pattern eagerly on first sighting. At high tip counts most distinct splits never reach the consensus threshold, so that work was wasted. Each distinct split now keeps a 12-byte (tree, L, R) witness; the pattern is materialised only for splits that survive thresholding (consensus) or for all splits (SplitFrequency). Results are unchanged: split sets/counts identical to the deterministic exact path, verified at n up to 5000 (dev/profiling/correctness-gate.R, 590 checks; test-consensus 8/8; test-Support 6/6). Up to ~13x faster for large/tall trees, median 1.23x, no change at small sizes. Adds dev/profiling/ harness (correctness gate, benchmark grid, Jansson lower-bound analysis + DECISION.md explaining why the full deterministic Jansson algorithm is not implemented). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
RenumberTips() on an unlabelled multiPhylo or list previously relabelled each tree with a separate R call (RenumberTips.phylo per element). It now relabels the whole forest in a single C++ pass (renumber_tips_to), deriving each tree's permutation from one hash of the target labels, with a no-op pass-through for trees already in target order. The C++ helper returns NULL to fall back to the existing per-tree R path for anything it would not handle identically: numeric tipOrder (per-tree target), a label set differing from the target (different length, unknown/NA label, or non-bijection), duplicate/NA target labels, or non-phylo list elements. List names are preserved. Results are unchanged. 1.8-9x faster (9x at high-k/low-n: 5000 trees of 30 tips, 0.335s -> 0.037s); Consensus() and every other RenumberTips() caller share the win. Full test suite green; new equivalence test confirms the C++ path matches the per-tree path across mixed orderings. Telling label-match without looking: only a labelled multiPhylo (TipLabel attribute) guarantees a shared order without inspecting each tree; absent that, inspection is unavoidable but now done in C++. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Performance benchmark results
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #274 +/- ##
==========================================
+ Coverage 96.18% 96.24% +0.05%
==========================================
Files 81 81
Lines 6035 6120 +85
==========================================
+ Hits 5805 5890 +85
Misses 230 230 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Consensus() stripped edge.length/node.label from every input tree via a per-tree lapply that forced a copy of each tree. The consensus core reads only edge + tip.label and the output tree is rebuilt from scratch, so neither field can affect the result; the strip's only load-bearing job was coercing a labelled multiPhylo's shared tip labels onto each tree, which bare c() already does. Replace the strip with 'trees <- c(trees)' (~7x cheaper than the strip: 0.82->0.11 ms at k=201), cutting wrapper overhead ~25% on small forests of many short trees (forest201.80 3.38->2.54 ms), ~6% on forest1k.888 (strict C++ core dominates there). Output verified byte-identical: 108-cell before/after grid (plain / unaligned / edge.length / node.label / both / labelled / labelled+edge.length / list / differing-size x p x check.labels x hash) and a 36-cell core metadata-independence check. New test-Consensus.R block covers the previously untested metadata-bearing and labelled-multiPhylo paths. Full suite 4192 PASS; correctness gate 590 checks; verify-consensus green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The KeepTip repeat-loop (triggered by trees of unequal size) was the one combination not covered by the C-004 metadata-invariance test. Add a case confirming edge.length does not change the (warned) result there. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Performance benchmark results
|
The even-n tie fixture (2xBalancedTree + 2xPectinateTree, n=4) was only exercised by the "exact == hashed" test, which checks the two internal counters against each other -- it cannot catch an error shared by both. Add ApeTest(tie, 0.5): at p=0.5 a split in exactly n/2 trees is dropped (thresh = floor(n/2)+1), so the two conflicting 50% splits are both dropped, matching ape::consensus. Pins the tie boundary to an oracle. Separately, findings.md records an open question for the maintainer: for p > 0.5 the code keeps count >= floor(n*p)+1 (strictly more than floor(n*p)), whereas the docstring promises "p or more" and ape keeps count >= p*n. They diverge only when n*p is an exact integer (e.g. n=3, p=2/3: a clade in 2/3 of trees is kept by ape, dropped here). Both are safe; the convention/doc-consistency call is the maintainer's. No threshold change made. Repro: dev/profiling/drivers/tie-check.R. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Performance benchmark results
|
Performance benchmark results
|
Consensus(trees, p) documented that it retains splits in "a proportion p or more" of trees, but the threshold was floor(n*p)+1 -- one tree too strict, so a split in exactly p*n trees was dropped at integer thresholds (e.g. a split in 2 of 3 trees with p = 2/3). ape::consensus keeps it (count >= p*n). Change the C++ threshold to thresh = max(floor(n/2)+1, ceil(p*n)), with a relative tolerance (1e-9*pn) snapping p*n to an exact integer through floating-point noise -- robust where ape's raw `bs >= p*ntree` is fragile. p = 0.5 is unchanged: the floor(n/2)+1 term keeps it strict (a split must occur in MORE than half the trees) so two conflicting 50% splits can never both be retained and the result stays a valid tree. Pin the boundary with ApeTest(thirds, 2/3) (clade in exactly 2/3 of trees). Update docstring/Rd and NEWS. Gate 590 PASS / 0 ape disagreements; full suite 4195 PASS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Performance benchmark results
|
The mem-check (valgrind) job stalls for many minutes in test-consensus.R: the 100k-leaf heap-limit build, the 33k-tip consensus, and two 33k-tree consensus runs dominate runtime under valgrind's ~30x slowdown while exercising no memory path the smaller cases miss. Gate them behind a new TESTING_MEMCHECK env var (set in memcheck.yml). TESTING_MEMCHECK is deliberately separate from USING_ASAN: the latter suppresses deliberate-bad-input error tests that ASan aborts on but valgrind tolerates, so setting USING_ASAN here would silently disable those error-path tests under valgrind. Full-scale consensus tests still run in uninstrumented CI. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Performance benchmark results
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Profiling-driven performance work on the consensus path, plus a data-backed
decision on Jansson's deterministic algorithm. Three independent pieces.
Three pieces:
Defer split materialisation in the hashed consensus counter.
Consensus()and
SplitFrequency()built every distinct split's packed bit pattern eagerlyon first sighting; at high tip counts most distinct splits never reach the
threshold, so that work was wasted. Each distinct split now keeps a 12-byte
(tree, L, R)witness and the pattern is built only for splits that survivethresholding (consensus) or for all splits (
SplitFrequency). Resultsunchanged; up to ~13× faster for large/tall trees, median 1.23×.
Faster
RenumberTips()for unlabelled forests. An unlabelledmultiPhylo/
listwas relabelled tree-by-tree in R; it is now relabelled in a singleC++ pass (
renumber_tips_to), with a no-op fast path for trees already in thetarget order. This was 54% of
Consensus()wall time at high-k/low-n.Consensus()and every otherRenumberTips()caller share the win.1.8–9× faster (9× at 5000 trees × 30 tips: 0.335 s → 0.037 s).
The C++ helper returns
NULLto fall back to the exact previous R path foranything it does not handle identically (numeric
tipOrder, differing labelsets, non-phylo elements), so behaviour is preserved on edge cases.
Jansson's deterministic O(kn) majority algorithm — evaluated, not
implemented. The headline ask. Using a rigorous lower bound (Jansson ≥
2×strict-path), Jansson is proven to lose to the optimised hashed counter in
every realistic, time-meaningful regime (high-k/low-n; high-k/high-n at
concordant and moderate conflict; most low-k/high-n). The only cells where it
is not proven to lose are sub-60 ms or degenerate near-star consensuses
(≈ independent random trees — not a real majority-consensus input). Combined
with the high implementation/correctness risk (the authors' own reference
impl, FACT, ships broken majority code) and the project's "avoid redundant
code" preference, it is not implemented. Full rationale, data, and the exact
algorithm to build it on request:
dev/profiling/DECISION.md.On telling label-match "without looking"
The structural answer: a labelled
multiPhylo(one carrying aTipLabelattribute) promises every tree shares that label vector, so
RenumberTips()already skips per-tree inspection and uses a single shared permutation. Without
that attribute there is no free lunch — each tree's labels must be inspected —
but that inspection (and the relabel) now happens in C++ rather than a per-tree
R loop, and a tree already in order is detected and passed through untouched.
Verification
dev/profiling/correctness-gate.R: 590 checks — hashed == exact consensussplit sets up to n=3000,
SplitFrequencysplit sets and counts atn=2000/5000, ape agrees at p=0.5.
RenumberTips()unlabelled C++ path is identical (edges, labels,order) to the per-tree path across mixed orderings.
dev/red-team/verify-consensus.Rgreen (also fixed: ithad been calling a non-existent
hash=arg and so had not run).🤖 Generated with Claude Code