Optimize interaction matrix scoring kernel by gwcmowry · Pull Request #13 · idptools/finches

gwcmowry · 2026-04-04T17:22:31Z

Summary

This PR speeds up the core FINCHES interaction-matrix scoring path while preserving existing outputs up to expected
floating-point differences and keeping the current test structure largely intact.

Main changes:

replace the sliding-window matrix_scan implementation with a prefix-sum based scan
move charge-mask generation into the existing Cython extension
vectorize aliphatic-mask generation
compute epsilon directly from the weighted-matrix transform rather than recomputing equivalent intermediate work

Why

Profiling pointed to three main costs in the pair-scoring path:

charge-weight mask generation spent substantial time in Python loops
aliphatic-mask generation also used Python nested loops
matrix_scan dominated runtime because it looped over every window and then over every cell inside each window

The biggest remaining bottleneck after mask optimization was matrix_scan, so the main algorithmic change here is
replacing the per-window nested-loop scan with a prefix-sum / integral-image approach.

Implementation details

finches/utils/matrix_manipulation.pyx

added a compiled charge_weighted_mask(...)
replaced the old matrix_scan(...) implementation with a prefix-sum based scan

For a fixed window size, the old implementation effectively did work proportional to the number of windows times the
number of cells per window, while the new version transforms the matrix once, builds a prefix sum once, and scores
each window in constant time.

finches/parsing_aminoacid_sequences.py

get_charge_weighted_mask(...) now delegates to the compiled implementation
get_aliphatic_weighted_mask(...) now uses a vectorized table lookup instead of nested Python loops

finches/epsilon_calculation.py

calculate_epsilon_value(...) now computes directly from the weighted matrix transform
added small compatibility wrappers/aliases used by the current callers/tests:
- Interaction_Matrix_Constructor
- get_attractive_repulsive_matrixes(...)
- mask_matrix(...)
- masked_matrix(...)
- flatten_matrix_to_vector(...)
- get_sequence_epsilon_vectors(...)

These wrappers are intended as compatibility shims so the optimization refactor does not break existing callers or
the current test suite.

finches/forcefields/mpipi.py

preserved mpipi_model(...) as a compatibility alias for existing callers/tests

Speedups

On representative synthetic sequence-pair benchmarks spanning 32x32 to 512x512 inputs, the optimized scorer matched
prior outputs up to floating-point noise and produced the following end-to-end speedups:

32x32: 0.680 ms -> 0.113 ms (6.0x)
64x64: 3.126 ms -> 0.220 ms (14.2x)
128x128: 12.413 ms -> 0.607 ms (20.5x)
256x256: 56.605 ms -> 2.080 ms (27.2x)
512x512: 267.734 ms -> 11.131 ms (24.1x)

Mask-level improvements were also large:

charge mask: up to about 260x faster
aliphatic mask: up to about 12x faster
weighted matrix build: up to about 8.7x faster

The largest win came from the scan change itself, since matrix_scan was the dominant remaining bottleneck.

Tests

I kept the existing test layout and comments as intact as possible and only made the minimal fixes needed for pytest
to run reliably from the repo root.

That included:

fixing a broken enumerate(...) case
fixing one stale symbol reference in test_FH_diagrams.py
making fixture paths explicit relative to the test file
refreshing the checked-in .npz fixtures under finches/tests/test_data/

Refreshed .npz fixtures were added to finches/tests/test_data/ so the existing regression tests run from the repo as
checked out; they are intended to restore self-contained test coverage, not to introduce a new expected behavior
baseline beyond floating-point-equivalent outputs from the refactor.

Test command used:

PYTHONPATH=.:/tmp/pytest_vendor MPLCONFIGDIR=/tmp/matplotlib HOME=/tmp
conda run --no-capture-output -n finches_clean python -m pytest finches/tests -q

Result:

11 passed in 2.23s

Copilot

Pull request overview

This PR optimizes FINCHES’ core interaction-matrix scoring pipeline by moving key masking work into compiled code, replacing the sliding-window scan with a prefix-sum (integral image) approach, and adding compatibility shims so existing callers/tests continue to work with minimal changes.

Changes:

Replaced matrix_scan(...)’s per-window nested loops with a prefix-sum based scan in the Cython extension.
Added a compiled charge_weighted_mask(...) and switched charge-mask generation to use it; vectorized aliphatic-mask generation.
Updated epsilon calculation to compute directly from the weighted-matrix transform and introduced compatibility aliases/wrappers to preserve existing APIs and tests.

Reviewed changes

Copilot reviewed 7 out of 12 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`finches/utils/matrix_manipulation.pyx`	Adds compiled charge mask generation and rewrites `matrix_scan` using an integral-image/prefix-sum method.
`finches/parsing_aminoacid_sequences.py`	Delegates charge mask to the Cython function and vectorizes aliphatic mask generation via table lookup.
`finches/epsilon_calculation.py`	Computes epsilon directly from the transformed weighted matrix; adds compatibility wrapper functions and aliases.
`finches/forcefields/mpipi.py`	Adds `mpipi_model(...)` compatibility alias with version-string mapping.
`finches/tests/test_FH_diagrams.py`	Fixes imports and uses the new `mpipi_model` alias.
`finches/tests/test_epsilon_calculation.py`	Makes test-data paths robust, fixes iteration bug, updates fixture key usage, and adds a reference test validating `matrix_scan` window scores.
`finches/tests/test_data/update_test_data.py`	Updates fixture-writing script (pathing, deterministic RNG, and fixture key naming).
`finches/tests/test_data/mPiPi_GGv1_seq_epsilon_and_vectors.npz`	Refreshed fixture data to match refactor outputs (within expected FP differences).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

foxrafa · 2026-04-04T20:14:22Z

Optimize interaction matrix scoring kernel

0fdf10d

Copilot AI review requested due to automatic review settings April 4, 2026 17:22

Copilot started reviewing on behalf of gwcmowry April 4, 2026 17:23 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize interaction matrix scoring kernel#13

Optimize interaction matrix scoring kernel#13
gwcmowry wants to merge 1 commit into
idptools:mainfrom
gwcmowry:perf/optimize-imc-kernel

gwcmowry commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

foxrafa commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gwcmowry commented Apr 4, 2026

Summary

Why

Implementation details

finches/utils/matrix_manipulation.pyx

finches/parsing_aminoacid_sequences.py

finches/epsilon_calculation.py

finches/forcefields/mpipi.py

Speedups

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

foxrafa commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants