Skip to content

find_features / Pattern splits fail on free peptides (no flanks) #338

Description

@breimanntools

Part of #336 (usability epic).

Problem

find_features (and CPP Pattern/PeriodicPattern splits) are unusable on free peptides with no
flanking context
— the entire linear-epitope use case. Pattern/PeriodicPattern carry a hardcoded
len_max=15; the find_features search stage applies them regardless of kws["n_split_max"], and it
carves a JMD out of the sequence, so a short free peptide (target <15 aa after JMD) raises:

ValueError: For split_kws (n_max=15): '{'Segment': ..., 'Pattern': {... 'len_max': 15}, ...}',
  following 'tmd' part contains too short sequences (e.g., 'PQFTIFGT', n=8).

The message doesn't explain the real cause or the fix, and it recurs for any peptide dataset shorter
than ~15 aa in the target region.

Suggestion

  • Auto-drop Pattern/PeriodicPattern when a part is shorter than their len_max (or when n_jmd=0),
    and honor kws len_max.
  • Raise a clear, actionable error: "target region too short for Pattern splits — use Segment-only or
    add flanks."
  • Consider a documented "free peptide / no-flank" recipe for find_features.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions