Memory-efficient chunked pairwise metric computation#151
Open
HAM41 wants to merge 1 commit intoEnnyvanBeest:mainfrom
Open
Memory-efficient chunked pairwise metric computation#151HAM41 wants to merge 1 commit intoEnnyvanBeest:mainfrom
HAM41 wants to merge 1 commit intoEnnyvanBeest:mainfrom
Conversation
The existing metric pipeline materializes full (N, waveidx, flips, N) arrays via np.tile when computing pairwise Euclidean distances, requiring 500+ GB for N>10k units. This adds two chunked functions that fuse the producer and consumer steps and process units in configurable row-blocks, reducing peak memory by ~3.6x and runtime by ~3.7x while producing numerically identical results. Benchmark on 7,439 units / 26 sessions (A327 probe 19076606401): - Memory: 321 GB -> 89 GB - Runtime: 1h 40m -> 27m - Self-match rate: 0.8330 (identical)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The pairwise metric functions in
metric_functions.py(get_Euclidean_dist,get_recentered_euclidean_dist) materialize full(N, waveidx, flips, N)intermediate arrays vianp.tile. For large unit counts (N > 5,000), these arrays consume hundreds of GB. For example, N=14,000 units requires ~500 GB just for the tiled intermediates inget_recentered_euclidean_dist, making the pipeline impractical for large-scale Neuropixels recordings with many sessions.This PR adds two chunked functions that fuse the producer and consumer steps (
get_Euclidean_dist+centroid_metricsandget_recentered_euclidean_dist+recentered_metrics) and process units in configurable row-blocks. The chunked functions never materialize the full 4D intermediate, only a(3, chunk_size, N, waveidx, flips)working array per iteration.Changes
metric_functions.py: Addget_euclidean_metrics_chunked()andget_recentered_metrics_chunked()overlord.py: Use chunked functions inextract_metric_scores()default_params.py: Addchunk_sizeparameter (default 500)tests/test_chunked_metrics.py: Tests confirming numerical equivalence between old and new code pathsOriginal functions are preserved for backward compatibility.
Benchmark
Tested on a probe with 7,439 units across 26 sessions (same data, same hardware):
Additionally, probes with 11,000 to 14,000 units that previously OOM'd at 768 GB now complete at 300 to 400 GB.
How it works
Both functions follow the same pattern:
waveidxtimepoints: shape(3, N, len_waveidx, n_flips)(3, chunk, 1, waveidx, flips) - (3, 1, N, waveidx, flips)to(3, chunk, N, waveidx, flips)(chunk, N, waveidx, flips)(N, N)outputThe
chunk_sizeparameter is tunable viaparam['chunk_size']and defaults to 500 (backward compatible viaparam.get()).Test plan
test_euclidean_metrics_match: Chunked output matches original 3-step pipeline (np.allclose,rtol=1e-10)test_recentered_metrics_match: Chunked output matches original 2-step pipelinetest_chunk_size_edge_cases: Consistent results withchunk_size=1vschunk_size=9999