Skip to content

StructurePreprocessor: per-feature failure isolation + msms/backend routing #340

Description

@breimanntools

Part of #336 (usability epic).

Problem

Two sharp edges hit while using StructurePreprocessor on ~400 PDB chains:

  1. depth silently needs the external msms binary. With on_failure="nan", one unavailable
    feature NaNs the whole entry (not just that column), so I silently got "0/1000 windows mapped"
    with no warning — only found by testing features one at a time.
  2. Feature/backend split is opaque. Features are divided across encode_dssp vs encode_pdb, and
    mixing them errors cryptically: "contact_count_8A … should be encoded by … use the matching
    method"
    — the user doesn't know which method owns which key.

Suggestion

  • Per-feature failure isolation: drop/NaN only the failing feature column, keep the rest; emit a
    one-line warning ("msms not found → depth unavailable").
  • A single encode(features=[...]) that routes each feature key to the right backend
    (dssp / pdb / pae), so users don't need to know the split.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions