Fix batch of low-risk correctness bugs: defensive DataFrame copies, config/plotting validation gaps, validator messages

## Problem

A full read-only audit (July 2026) surfaced a cluster of independent, low-risk
correctness defects that share one property: **none change published numeric
output**, so all can be fixed and regression-tested without a regolden. Left in
place they cause real harm:

- **Silent data corruption (High).** `load_dataset(non_canonical_aa="gap")` and
  `load_features()` return the shared `@lru_cache` DataFrame without a defensive
  copy, so an in-place edit — or the gap substitution itself — poisons the cache
  for every later caller in the process. `SequenceFeature.get_df_feat(accept_gaps=True)`
  similarly rewrites the caller's `df_parts` in place.
- **Cryptic crashes on valid edge inputs.** `read_fasta` raises
  `KeyError('sequence')` on a leading blank line; `SeqMut.mutate(df_feat=…)`
  raises a length-mismatch `ValueError` when two entries share a mutation label;
  `CPP.eval` raises `KMeans(n_clusters=0)` on a single-feature set.
- **Dead validation / wrong output.** Several `options`/plotting guards never
  fire (or always fire) due to bool-vs-str, off-by-one, or reachability bugs, and
  several validators report the wrong parameter name.

## Goal

Fix all items below with a regression test each; **default output byte-identical**
(regression-tested), no regolden, no new dependencies.

## Requirements

Backend-trusts-frontend and the validation-block conventions apply throughout;
add an explicit category to every touched `warnings.warn`.

**Defensive copies (shared-object mutation)**
- [ ] `data_handling/_load_dataset.py` — copy the cached frame before the
      non-canonical-AA gap substitution (mirror the Overview branch's `.copy()`).
- [ ] `data_handling/_load_features.py` — return `df_feat.copy()`.
- [ ] `feature_engineering/_backend/check_feature.py` (`check_match_df_parts_df_scales`)
      — copy `df_parts` before the gap `str.replace`, so `get_df_feat` no longer
      mutates the caller.

**Crash-on-edge-input**
- [ ] `data_handling/_backend/parse_fasta.py` — skip blank lines / raise a clear
      `ValueError` when a sequence line precedes the first header.
- [ ] `protein_design/_seqmut.py` (`mutate`) — re-join scored results on
      `(entry, mutation)` (or original row order), not the entry-less mutation label.
- [ ] `feature_engineering/_backend/cpp/cpp_eval.py` (`get_best_n_clusters`) —
      clamp `max_n = max(1, min(100, n_features-1))` and short-circuit `n_features<=1`.

**Numeric**
- [ ] `seq_analysis_pro/_backend/comp_seq_sim.py` — self-similarity diagonal → `100`
      (matches the `[0,100]` scale of every off-diagonal cell), not `1`.

**config.options validation** *(CONFIRM-FIRST — see checklist)*
- [ ] `config.py` — validate the incoming *candidate* for `verbose`/`random_state`
      (not the current global); validate `name_tmd`; stop leaving
      `LOKY_MAX_CPU_COUNT=1` set permanently once `allow_multiprocessing` is re-enabled.

**Plotting / display dead-branch & label bugs**
- [ ] `_utils/plotting.py` (`_check_marker_size`) — move the length check into the
      list branch so a wrong-length list raises `ValueError`, not a bare `IndexError`.
- [ ] `show_html/_display_df.py` (`_check_show`) — `max_val = n-1` for the row/col selector.
- [ ] `plotting/_plot_settings.py` — set `axes.linewidth`/tick widths in the
      `weight_bold` branch too (bold must not render thinner than normal).
- [ ] `_backend/cpp/cpp_plot_feature_map.py` — `show_only_max = imp_bar_label_type != "long"`
      (currently a bool-vs-str compare, always True).
- [ ] `_backend/cpp/cpp_plot_profile.py` — `max(0, …)` y-padding (currently `min(0, …)`, always 0).
- [ ] `plotting/_plot_legend.py` — fix the `name=` / dict keys so validation errors
      report the real parameter.

**Validator messages / warnings hygiene**
- [ ] `_aaclust.py` (`check_match_X_n_clusters`) — correct the reversed inequality in the message.
- [ ] `_backend/check_aaclust.py` (`check_metric`) — drop "None or" from the message
      (or accept None; see the AAclust decision issue).
- [ ] `data_handling/_seq_preproc.py` — `min_window_size = slide_stop - slide_start`
      (typo `slide_stop - slide_stop` makes the guard dead).
- [ ] `data_handling/_load_scales.py` — `name="unclassified_out"` (currently `unclassified_in`).
- [ ] `seq_analysis_pro/_comp_seq_sim.py` — close the paren in the `name=` f-string.
- [ ] Add explicit categories to the bare `warnings.warn` in `_load_dataset.py`,
      `_read_fasta.py`.

## KPIs / Acceptance criteria
- [ ] Each fix carries ≥1 regression test (identity / mutation-leak test for the
      copy fixes; crash→clean-error tests for the edge inputs; guard-fires tests
      for the dead branches).
- [ ] Default numeric output **byte-identical** to current (regression-tested); no
      regolden; full `pytest tests` green on the Linux + Windows floors.
- [ ] No `print()`; bare `ValueError`/`RuntimeError`; no `aaanalysis._utils.*`
      imports outside `utils.py`.

## Scope / non-goals
- **Output-changing defects are excluded** and tracked separately (CPP redundancy
  filter, TreeModel per-round seeding, BH p-value monotonicity) — those need an
  ADR-0032 tier + regolden.
- **No performance changes** — the performance program is closed (ADR-0033).

## Standards checklist
- [x] Frontend/backend split honored; backend trusts frontend
- [ ] **CONFIRM-FIRST: `config.py` is touched** — the `options` validation fixes
      require maintainer sign-off before edit.
- [ ] tests (unit; identity/golden where apt); reproducibility contract respected
- [x] no `print()`; bare exceptions; no cross-package `_utils` imports
- [x] no public-symbol / `__all__` change


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix batch of low-risk correctness bugs: defensive DataFrame copies, config/plotting validation gaps, validator messages #342

Problem

Goal

Requirements

KPIs / Acceptance criteria

Scope / non-goals

Standards checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Fix batch of low-risk correctness bugs: defensive DataFrame copies, config/plotting validation gaps, validator messages #342

Description

Problem

Goal

Requirements

KPIs / Acceptance criteria

Scope / non-goals

Standards checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions