Skip to content

fix(cpp): make CPP splits usable on free peptides with no flanks (#338)#349

Draft
breimanntools wants to merge 1 commit into
masterfrom
fix/338-free-peptide-splits
Draft

fix(cpp): make CPP splits usable on free peptides with no flanks (#338)#349
breimanntools wants to merge 1 commit into
masterfrom
fix/338-free-peptide-splits

Conversation

@breimanntools

Copy link
Copy Markdown
Owner

Closes #338.

Summary

Pattern/PeriodicPattern splits and aap.find_features were unusable on free peptides with no
flanking context (the linear-epitope case): the default split config requires each part to be >= 15
residues, so any target region shorter than ~15 aa raised an opaque "too short" ValueError.

Details

  • Actionable error: check_match_df_parts_split_kws now names the binding split length (which
    split type / parameter drives n_max) and the concrete fix (Segment-only splits, lower
    len_max/n_split_max, or add jmd_n/jmd_c context).
  • Honor kws: find_features kws now accepts len_max and actually threads
    n_split_max/len_max into the split config for the fast path (previously ignored) and the
    CPPGrid search stages.
  • Auto-fit (fast path): the split config auto-fits to the shortest part, dropping
    Pattern/PeriodicPattern and clamping n_split_max with a single UserWarning, so free peptides
    run out of the box. Byte-identical when parts are long enough.
  • Search Stage 3: the simplify CPP now uses the winner's split_kws instead of the default,
    which previously hard-errored on short parts even though the grid had gracefully soft-dropped the
    non-fitting configs.

Results for flanked inputs are unchanged (byte-identical), proven by a fast-path A/B hash comparison
master-vs-branch.

Ripple

Part of epic #336.

🤖 Generated with Claude Code

Pattern/PeriodicPattern splits and aap.find_features were unusable on free
peptides with no flanking context (linear-epitope case): the default split
config requires each part to be >= 15 residues, so any target region shorter
than ~15 aa raised an opaque "too short" ValueError.

- Actionable error: check_match_df_parts_split_kws now names the binding split
  length (which split type / parameter drives n_max) and the concrete fix
  (Segment-only splits, lower len_max/n_split_max, or add jmd_n/jmd_c context).
- Honor kws: find_features 'kws' now accepts 'len_max' and actually threads
  'n_split_max'/'len_max' into the split config for the fast path (previously
  ignored) and the CPPGrid search stages, so shorter Pattern/Segment splits can
  be requested.
- Auto-fit (fast path): the split config auto-fits to the shortest part,
  dropping Pattern/PeriodicPattern and clamping n_split_max with a UserWarning,
  so free peptides run out of the box. Byte-identical when parts are long enough.
- Search Stage 3: the simplify CPP now uses the winner's split_kws instead of
  the default, which previously hard-errored on short parts even though the grid
  had gracefully soft-dropped the non-fitting configs.

Results for flanked inputs are unchanged (byte-identical). Adds unit tests for
the actionable message, Segment-only / reduced-len_max short-part paths, the
auto-fit helper (drop/clamp + warning), and free-peptide fast/balanced runs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@codecov

codecov Bot commented Jul 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.75510% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.83%. Comparing base (7dcc8d8) to head (b755e5c).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
aaanalysis/pipe/_find_features.py 83.87% 4 Missing and 1 partial ⚠️
...ysis/feature_engineering/_backend/check_feature.py 94.44% 0 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (87.75%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #349   +/-   ##
=======================================
  Coverage   94.83%   94.83%           
=======================================
  Files         196      196           
  Lines       18767    18794   +27     
  Branches     3175     3180    +5     
=======================================
+ Hits        17797    17823   +26     
- Misses        633      634    +1     
  Partials      337      337           
Files with missing lines Coverage Δ
...ysis/feature_engineering/_backend/check_feature.py 93.43% <94.44%> (ø)
aaanalysis/pipe/_find_features.py 58.11% <83.87%> (+3.66%) ⬆️
Components Coverage Δ
cpp_core 94.95% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

find_features / Pattern splits fail on free peptides (no flanks)

1 participant