Memory-efficient chunked pairwise metric computation by HAM41 · Pull Request #151 · EnnyvanBeest/UnitMatch

HAM41 · 2026-03-30T21:02:41Z

Summary

The pairwise metric functions in metric_functions.py (get_Euclidean_dist, get_recentered_euclidean_dist) materialize full (N, waveidx, flips, N) intermediate arrays via np.tile. For large unit counts (N > 5,000), these arrays consume hundreds of GB. For example, N=14,000 units requires ~500 GB just for the tiled intermediates in get_recentered_euclidean_dist, making the pipeline impractical for large-scale Neuropixels recordings with many sessions.

This PR adds two chunked functions that fuse the producer and consumer steps (get_Euclidean_dist + centroid_metrics and get_recentered_euclidean_dist + recentered_metrics) and process units in configurable row-blocks. The chunked functions never materialize the full 4D intermediate, only a (3, chunk_size, N, waveidx, flips) working array per iteration.

Changes

metric_functions.py: Add get_euclidean_metrics_chunked() and get_recentered_metrics_chunked()
overlord.py: Use chunked functions in extract_metric_scores()
default_params.py: Add chunk_size parameter (default 500)
tests/test_chunked_metrics.py: Tests confirming numerical equivalence between old and new code paths

Original functions are preserved for backward compatibility.

Benchmark

Tested on a probe with 7,439 units across 26 sessions (same data, same hardware):

	Before	After	Improvement
Peak memory (MaxRSS)	321 GB	89 GB	3.6x
Total runtime	1h 40m	27m	3.7x
Self-match rate	0.8330	0.8330	Identical

Additionally, probes with 11,000 to 14,000 units that previously OOM'd at 768 GB now complete at 300 to 400 GB.

How it works

Both functions follow the same pattern:

Extract CV0 and CV1 data at waveidx timepoints: shape (3, N, len_waveidx, n_flips)
Loop over row-chunks of CV0 (default chunk_size=500):
- Broadcast (3, chunk, 1, waveidx, flips) - (3, 1, N, waveidx, flips) to (3, chunk, N, waveidx, flips)
- Compute Euclidean norm over spatial dims: (chunk, N, waveidx, flips)
- Reduce to the needed metric (peak distance, variance, mean) and write into (N, N) output
- Delete intermediates before next iteration
Apply the same rescaling as the original functions

The chunk_size parameter is tunable via param['chunk_size'] and defaults to 500 (backward compatible via param.get()).

Test plan

test_euclidean_metrics_match: Chunked output matches original 3-step pipeline (np.allclose, rtol=1e-10)
test_recentered_metrics_match: Chunked output matches original 2-step pipeline
test_chunk_size_edge_cases: Consistent results with chunk_size=1 vs chunk_size=9999

The existing metric pipeline materializes full (N, waveidx, flips, N) arrays via np.tile when computing pairwise Euclidean distances, requiring 500+ GB for N>10k units. This adds two chunked functions that fuse the producer and consumer steps and process units in configurable row-blocks, reducing peak memory by ~3.6x and runtime by ~3.7x while producing numerically identical results. Benchmark on 7,439 units / 26 sessions (A327 probe 19076606401): - Memory: 321 GB -> 89 GB - Runtime: 1h 40m -> 27m - Self-match rate: 0.8330 (identical)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory-efficient chunked pairwise metric computation#151

Memory-efficient chunked pairwise metric computation#151
HAM41 wants to merge 1 commit intoEnnyvanBeest:mainfrom
HAM41:chunked-pairwise-metrics

HAM41 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HAM41 commented Mar 30, 2026

Summary

Changes

Benchmark

How it works

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant