fix(cpp): make CPP splits usable on free peptides with no flanks (#338)#349
Draft
breimanntools wants to merge 1 commit into
Draft
fix(cpp): make CPP splits usable on free peptides with no flanks (#338)#349breimanntools wants to merge 1 commit into
breimanntools wants to merge 1 commit into
Conversation
Pattern/PeriodicPattern splits and aap.find_features were unusable on free peptides with no flanking context (linear-epitope case): the default split config requires each part to be >= 15 residues, so any target region shorter than ~15 aa raised an opaque "too short" ValueError. - Actionable error: check_match_df_parts_split_kws now names the binding split length (which split type / parameter drives n_max) and the concrete fix (Segment-only splits, lower len_max/n_split_max, or add jmd_n/jmd_c context). - Honor kws: find_features 'kws' now accepts 'len_max' and actually threads 'n_split_max'/'len_max' into the split config for the fast path (previously ignored) and the CPPGrid search stages, so shorter Pattern/Segment splits can be requested. - Auto-fit (fast path): the split config auto-fits to the shortest part, dropping Pattern/PeriodicPattern and clamping n_split_max with a UserWarning, so free peptides run out of the box. Byte-identical when parts are long enough. - Search Stage 3: the simplify CPP now uses the winner's split_kws instead of the default, which previously hard-errored on short parts even though the grid had gracefully soft-dropped the non-fitting configs. Results for flanked inputs are unchanged (byte-identical). Adds unit tests for the actionable message, Segment-only / reduced-len_max short-part paths, the auto-fit helper (drop/clamp + warning), and free-peptide fast/balanced runs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (87.75%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## master #349 +/- ##
=======================================
Coverage 94.83% 94.83%
=======================================
Files 196 196
Lines 18767 18794 +27
Branches 3175 3180 +5
=======================================
+ Hits 17797 17823 +26
- Misses 633 634 +1
Partials 337 337
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #338.
Summary
Pattern/PeriodicPatternsplits andaap.find_featureswere unusable on free peptides with noflanking context (the linear-epitope case): the default split config requires each part to be >= 15
residues, so any target region shorter than ~15 aa raised an opaque "too short"
ValueError.Details
check_match_df_parts_split_kwsnow names the binding split length (whichsplit type / parameter drives
n_max) and the concrete fix (Segment-only splits, lowerlen_max/n_split_max, or addjmd_n/jmd_ccontext).find_featureskwsnow acceptslen_maxand actually threadsn_split_max/len_maxinto the split config for the fast path (previously ignored) and theCPPGridsearch stages.Pattern/PeriodicPatternand clampingn_split_maxwith a singleUserWarning, so free peptidesrun out of the box. Byte-identical when parts are long enough.
split_kwsinstead of the default,which previously hard-errored on short parts even though the grid had gracefully soft-dropped the
non-fitting configs.
Results for flanked inputs are unchanged (byte-identical), proven by a fast-path A/B hash comparison
master-vs-branch.
Ripple
len_maxshort-part paths, the auto-fithelper (drop/clamp + warning), and free-peptide fast/balanced runs.
Fixed entry on merge).
Part of epic #336.
🤖 Generated with Claude Code