Skip to content

Memory-efficient chunked pairwise metric computation#151

Open
HAM41 wants to merge 1 commit intoEnnyvanBeest:mainfrom
HAM41:chunked-pairwise-metrics
Open

Memory-efficient chunked pairwise metric computation#151
HAM41 wants to merge 1 commit intoEnnyvanBeest:mainfrom
HAM41:chunked-pairwise-metrics

Conversation

@HAM41
Copy link
Copy Markdown

@HAM41 HAM41 commented Mar 30, 2026

Summary

The pairwise metric functions in metric_functions.py (get_Euclidean_dist, get_recentered_euclidean_dist) materialize full (N, waveidx, flips, N) intermediate arrays via np.tile. For large unit counts (N > 5,000), these arrays consume hundreds of GB. For example, N=14,000 units requires ~500 GB just for the tiled intermediates in get_recentered_euclidean_dist, making the pipeline impractical for large-scale Neuropixels recordings with many sessions.

This PR adds two chunked functions that fuse the producer and consumer steps (get_Euclidean_dist + centroid_metrics and get_recentered_euclidean_dist + recentered_metrics) and process units in configurable row-blocks. The chunked functions never materialize the full 4D intermediate, only a (3, chunk_size, N, waveidx, flips) working array per iteration.

Changes

  • metric_functions.py: Add get_euclidean_metrics_chunked() and get_recentered_metrics_chunked()
  • overlord.py: Use chunked functions in extract_metric_scores()
  • default_params.py: Add chunk_size parameter (default 500)
  • tests/test_chunked_metrics.py: Tests confirming numerical equivalence between old and new code paths

Original functions are preserved for backward compatibility.

Benchmark

Tested on a probe with 7,439 units across 26 sessions (same data, same hardware):

Before After Improvement
Peak memory (MaxRSS) 321 GB 89 GB 3.6x
Total runtime 1h 40m 27m 3.7x
Self-match rate 0.8330 0.8330 Identical

Additionally, probes with 11,000 to 14,000 units that previously OOM'd at 768 GB now complete at 300 to 400 GB.

How it works

Both functions follow the same pattern:

  1. Extract CV0 and CV1 data at waveidx timepoints: shape (3, N, len_waveidx, n_flips)
  2. Loop over row-chunks of CV0 (default chunk_size=500):
    • Broadcast (3, chunk, 1, waveidx, flips) - (3, 1, N, waveidx, flips) to (3, chunk, N, waveidx, flips)
    • Compute Euclidean norm over spatial dims: (chunk, N, waveidx, flips)
    • Reduce to the needed metric (peak distance, variance, mean) and write into (N, N) output
    • Delete intermediates before next iteration
  3. Apply the same rescaling as the original functions

The chunk_size parameter is tunable via param['chunk_size'] and defaults to 500 (backward compatible via param.get()).

Test plan

  • test_euclidean_metrics_match: Chunked output matches original 3-step pipeline (np.allclose, rtol=1e-10)
  • test_recentered_metrics_match: Chunked output matches original 2-step pipeline
  • test_chunk_size_edge_cases: Consistent results with chunk_size=1 vs chunk_size=9999

The existing metric pipeline materializes full (N, waveidx, flips, N) arrays
via np.tile when computing pairwise Euclidean distances, requiring 500+ GB
for N>10k units. This adds two chunked functions that fuse the producer and
consumer steps and process units in configurable row-blocks, reducing peak
memory by ~3.6x and runtime by ~3.7x while producing numerically identical
results.

Benchmark on 7,439 units / 26 sessions (A327 probe 19076606401):
  - Memory: 321 GB -> 89 GB
  - Runtime: 1h 40m -> 27m
  - Self-match rate: 0.8330 (identical)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant