Update scph.cpp - Reducing memory usage in SCPH V4 kernels (kpoint/band) via temporary reuse and in-place MPI reduction by andersonprizzi · Pull Request #305 · ttadano/alamode

andersonprizzi · 2026-01-05T04:13:07Z

This pull request reduces peak memory usage in the SCPH quartic matrix-element routines:

Scph::compute_V4_elements_mpi_over_kpoint
Scph::compute_V4_elements_mpi_over_band

The refactor reuses temporary buffers and removes an extra per-rank MPI staging tensor. The algebraic contractions and index transforms are preserved, and results remain unchanged within floating-point roundoff.

Changes:

In both functions:
The intermediate v4_mpi buffer has been removed. Each MPI rank writes its local contributions directly into v4_out. The final accumulation is performed with MPI_Allreduce(MPI_IN_PLACE, &v4_out[0][0][0], ...). This avoids keeping two full copies of the V4 tensor per rank.
In the compute_V4_elements_mpi_over_kpoint function:
The original implementation allocated several large temporary buffers at once, even though only two are needed at any given step. This refactor therefore reuses two complex ns2 x ns2 buffers across the successive index transformations.
In the compute_V4_elements_mpi_over_band function:
Memory usage is reduced by replacing v4_tmp0 with a compact sparse representation of the non-zero φ4 elements stored as (row, col, value) entries in phi4_array, along with a col_ptr offset array to iterate efficiently over the non-zeros of each column during the first-index transformation (preserving the original access pattern). In addition, the temporary workspace is reduced by reusing only v4_tmp1 and v4_tmp2 in a ping-pong manner instead of allocating v4_tmp1, v4_tmp2, v4_tmp3 and v4_tmp4 simultaneously, which can significantly reduce memory usage when the number of non-zero entries is much smaller than ns2 x ns2.

These changes were motivated by the fact that, for large nk and/or ns, the V4 kernels can dominate memory usage due to multiple ns2 x ns2 temporary buffers and duplicated tensors per MPI rank. This can limit the maximum feasible system size.

Tests were performed by running an SCPH calculation using a representative input deck (a test case including the IFCs and the associated strain/force dataset). The baseline version and this PR were executed with the same MPI/OMP configuration, and the resulting outputs were compared, and no differences were observed.

Optimize memory usage in compute_V4_elements_mpi_over_kpoint and compute_V4_elements_mpi_over_band by reusing temporaries and using in-place MPI reduction. This refactor preserves the same contractions keeping results unchanged within floating-point roundoff.

Update scph.cpp

b2ec7c0

Optimize memory usage in compute_V4_elements_mpi_over_kpoint and compute_V4_elements_mpi_over_band by reusing temporaries and using in-place MPI reduction. This refactor preserves the same contractions keeping results unchanged within floating-point roundoff.

andersonprizzi changed the title ~~Update scph.cpp - Reduce memory footprint in SCPH V4 kernels (kpoint/band) via temporary reuse and in-place MPI reduction~~ Update scph.cpp - Reduce memory in SCPH V4 kernels (kpoint/band) via temporary reuse and in-place MPI reduction Jan 5, 2026

Update scph.cpp

9eab5f7

andersonprizzi changed the title ~~Update scph.cpp - Reduce memory in SCPH V4 kernels (kpoint/band) via temporary reuse and in-place MPI reduction~~ Update scph.cpp - Reducing memory usage in SCPH V4 kernels (kpoint/band) via temporary reuse and in-place MPI reduction Jan 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update scph.cpp - Reducing memory usage in SCPH V4 kernels (kpoint/band) via temporary reuse and in-place MPI reduction#305

Update scph.cpp - Reducing memory usage in SCPH V4 kernels (kpoint/band) via temporary reuse and in-place MPI reduction#305
andersonprizzi wants to merge 2 commits intottadano:developfrom
andersonprizzi:patch-1

andersonprizzi commented Jan 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andersonprizzi commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andersonprizzi commented Jan 5, 2026 •

edited

Loading