MILC interface: Exact current#1633
Conversation
maddyscientist
left a comment
There was a problem hiding this comment.
This looks good to me. I'll leave @weinbe2 to comment on the MILC interface specifics. I've left a couple of comments.
|
Thanks for the tweaks @leonhostetler. I've left a couple of non-essential comments that might be nice to update if you make another push. @weinbe2 any thoughts before we merge? |
|
Thanks for the PR, @leonhostetler , this largely looks good. I have a few questions/thoughts:
No other feedback, if you've verified that this works with the current workflow in MILC I trust you. It could be useful if you documented this on the QUDA Wiki in case other people wanted to use this, but the PR doesn't need to be blocked on that. |
…from a single precision MILC run
…igenvector D2H Replace the per-eigenvector host post-processing in qudaExactCurrent with device-side accumulation. The old loop copied each staggered contraction (one complex/site) to the host and summed scaled imaginary parts on the CPU -- ~4*n_evec*2 tiny latency-bound D2H transfers, plus hand reinterpret_cast of a raw void* buffer. Instead, wrap the two halves of the contraction output (even/odd parity) as single-parity nColor=1, nSpin=1 reference fields and sum the scaled complex contractions on the device with blas::axpy. The imaginary part is extracted once at the end into MILC's interleaved jlow arrays, reducing the device->host traffic to 8 (or 16) larger reads.
|
Thanks for reviewing this, @weinbe2 !
I agree. On the one hand, in practice, I don't know that we have any workflows where we load a completely different gauge config during a run. In principle though, we should probably guard against that possibility. As for the heavy quarks, yes, I figured that since the eigenvectors from the non-Naik'ed operator were already being used as an approximation to the real eigenvectors, then why not the eigenvalues as well? For deflated CG, I figured/assumed this would be fine in that they should be a good approximation and the CG should correct any errors. For other uses of the eigenvalues (as in
I had neglected to test this after recent changes because an earlier investigation of single precision for our exact current calculations showed that it was not feasible for us. It was simply not precise enough. Nevertheless, I should've made sure that it works in principle. After fixing a number of bugs (b35fe11, a132898, b2b5463) it works now. I have also taken the opportunity to make a few more updates to The commit 8357c77 does a refactoring of |
weinbe2
left a comment
There was a problem hiding this comment.
This all looks good now, thank you for checking the precision = 1 builds---sounds like it indeed help make things more robust!
This is good to merge from my end.
This is the second and final PR in the series to enable exact current calculations for HVP in the MILC/QUDA interface. The previous PR was #1590 .
Changes:
Adds a new MILC interface function,
qudaExactCurrent, inlib/milc_interface.cpp. This computes the exact low-mode contribution to the current densities needed for HVP current calculations.Implements eigenvalue shifting in the MILC interface for preserved deflation spaces. Previously, when using a preserved deflation space across multiple solves, the eigenvalues were recomputed each time the mass changed. With this update,
computeEvalsis called only once, during the first solve. The resulting zero-mass eigenvalues are cached and shifted as needed for subsequent masses.This avoids many unnecessary mat-vecs and greatly reduces output volume: eigenvalues are now printed only once instead of once per solve, which previously could produce O(10^6) lines of eigenvalue output in a typical production run.
Makes
lib/eig_block_trlm.cppandlib/eig_trlm.cppslightly more verbose by printing progress notifications every 10th or 100th Lanczos step. On production-size lattices, eigensolves can otherwise run for more than an hour without any visible progress, which can make a normally progressing job appear to be stalled.Testing: