Apply space-group symmetry transform in X2C GW and HF kernels #14
Open
gauravharsha wants to merge 15 commits into
Open
Apply space-group symmetry transform in X2C GW and HF kernels #14gauravharsha wants to merge 15 commits into
gauravharsha wants to merge 15 commits into
Conversation
Use value_AO in copy_Gk_2c and get_dm_fbz; restore off-diagonal Madelung in add_Ewald. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
check_x2c_hubbard_symmetry mirrors the green-mbpt test: verifies that one SCF step with space-group or TR-only symmetry gives the same Sigma at every IBZ k-point as the no-symmetry full-BZ run (tol=1e-8). s-type orbitals only; SU(2) spinor transform exercised but not higher angular-momentum orbital representations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Applies full space-group (including time-reversal) AO-basis symmetry reconstruction G(k_full) = U_k G(k_ibz) U_k† in the X2C GPU HF/GW paths, and adds regression tests to ensure self-energies are symmetry-consistent across “no symmetry”, “TR-only”, and full space-group symmetry inputs.
Changes:
- X2C HF: reconstruct full-BZ density matrix blocks via
k_symmetry().value_AO(...)rather than manual TR-only spin-flip branching. - X2C GW: reconstruct full-BZ Green’s function blocks via
k_symmetry().value_AO(...)for both double and single precision overloads. - Tests: add Hubbard+Rashba symmetry-consistency checks for both HF and GW kernels.
Reviewed changes
Copilot reviewed 3 out of 13 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| test/cu_solver_test.cpp | Adds new Hubbard+Rashba symmetry consistency test sections for HF and GW. |
| src/hf_gpu_kernel.cpp | Updates X2C HF density-matrix reconstruction to use value_AO for full symmetry transforms. |
| src/gw_gpu_kernel.cpp | Updates X2C GW copy_Gk_2c to use value_AO for full symmetry transforms (double + float). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+164
to
+172
| if (scf_type == "GW") { | ||
| green::grids::transformer_t ft(p); | ||
| auto [kernel, solver] = green::gpu::custom_gw_kernel(true, p, nao, nso, ns, NQ, ft, bz, Sk); | ||
| solver(G_shared, S_shared); | ||
| result.resize(nts, ns, ink, nso, nso); | ||
| S_shared.fence(); | ||
| if (!green::utils::context().node_rank) result << S_shared.object(); | ||
| S_shared.fence(); | ||
| } else { |
Comment on lines
+107
to
+112
| const std::string dir = TEST_PATH + "/GW_X2C_Hubbard"s; | ||
| const std::string df_path = dir + (scf_type == "GW" ? "/df_int"s : "/df_hf_int"s); | ||
| const std::string grid_file = GRID_PATH + "/ir/1e4.h5"s; | ||
| constexpr size_t ns = 1, nk = 36, nso = 4, nao = 2; | ||
| constexpr double tol = 1e-8; | ||
|
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ed ns/nk/nso/nao/ink
…ference Mirror of the green-mbpt change: the X2C symmetry consistency check now uses a real cubic Ar (def2-svp, --x2c 2, 3x3x1 k-mesh) calculation in place of the Hubbard+Rashba model. This exercises both the orbital point-group representation (p-shells in the basis) and the SU(2) spinor part of the double-group transform — which the s-orbital Hubbard model could not reach. nk=9; ink per run no_symm=9, trs_only=5, full_symm=3. The reference G_tau is extracted from iter1/G_tau/data into a minimal data_<tag>.h5 file (G_tau only); Sigma1 and Sigma_tau are recomputed and compared across the three symmetry modes inside the test. Single df_hf_int directory shared between HF and GW paths. Tolerance set to 1e-5 to accommodate the integral-storage-floor residual between symmetry-reduced and no-symmetry runs.
cu_symmetry::initialize and cugw_utils ctor now take nso alongside nao. gw_gpu_kernel forwards _nso at both scalar and X2C call sites; the k_sym_transform_ao buffer in make_cu_symmetry_data is sized nso×nso (no-op for scalar where nso==nao; X2C still passes build_k_ao=false so the buffer stays empty). Memory accounting in optimize_ntbatch verified consistent — no behavioral change.
In check_x2c_ar_symmetry, when any element of (i, t) Sigma slice exceeds tol, print aa/bb/ab/ba block max diffs with element location, plus the overall max and a label identifying which sub-check is firing (full_symm vs no_symm or trs_only vs no_symm). Lets the failing pattern narrow down whether the bug is global (all blocks similar diff), in one spinor block (e.g. ab handling), or in the Hermiticity derivation (ba != ab^dagger).
compute_second_tau_contraction_2C previously applied U_q with hardcoded OP_N (step 2a) and OP_C (step 2c), then conjugated the result Y2 via RSCAL when q_conj_after_uq=true. Conjugating Y2 elementwise also conjugates the Y1 = V*G contribution baked into it, which double-counts the TR action — Y1 already carries the correct TR convention from upstream (copy_Gk_2c on the CPU side). Mirror the scalar compute_second_tau_contraction logic instead: flip OP_N/OP_C in 2a/2c based on q_conj_after_uq. This gives the correct W(q_deg) = U_q^* * P * U_q^T * Y1 for TR-related q in the auxiliary basis without touching Y1. Non-TR path unchanged (OP_N/OP_C unchanged). Only trs_only and full_symm runs (which include TR-related q_deg in q-stars) are affected; no_symm runs (every q is its own IBZ rep) take the q_conj_after_uq=false branch and were already correct.
transform_k_ao_device_2c had its own block-by-block aa↔bb / ab/ba logic that duplicated (and got wrong) what the nso×nso k_sym_transform_ao matrices from the input file already encode via σ_y spinor mixing. Delete it and route the X2C path through the same transform_k_ao_device that scalar uses — the only difference is the matrix dim (nao vs. nso), inferred from k_ao_transforms.size(). - cu_symmetry::initialize now always uploads k_ao_transforms (was scalar-only) and sizes matrix_stride_ from k_transform_dim_, so X2C scratch grows to nts*nso² automatically. - transform_k_ao_device_impl uses `dim = k_transform_dim_` for both GEMMs, and the existing single-RSCAL TR branch conjugates the full U·G·U† output (correct for X2C because the σ_y row/col swap already happened inside the GEMMs). - gw_gpu_kernel.cpp:472 sets build_k_ao=true for the X2C path so the nso×nso U matrices reach cu_symmetry. Wiring the X2C GW/HF paths to actually call transform_k_ao_device (replacing host-side copy_Gk_2c / get_dm_fbz) is left for a follow-up: the GPU function expects contiguous nso×nso layout per (k, t), but the existing X2C downstream buffers are 4-block (4, nts, nao, nao). That refactor touches the qkpt buffer pool and the per-spin-block GEMMs in compute_first/second_tau_contraction_2C. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dm_ibz (ink, nso, nso) on device → per-k_full transform_k_ao_device produces nso×nso dm_fbz on device → cuhf_utils picks aa/bb/ab sub-views at lda=nso. Drops the old host get_dm_fbz path (value_AO + Eigen-Map ndarray copy + upload). cuhf_utils: new constructor takes a device (nk, nso, nso) dm_fbz pointer and stores _Dm_fbz_nso; add_exchange_to_fock branches on it for per-ss GEMM with (row_off, col_off) quadrant offsets and lda=nso. Legacy 3-block path (_Dm_fbz_sk2ba, lda=nao) preserved for scalar HF. make_cu_symmetry_data is now externally linkable; HF calls with build_q_p0=false. Memory cost: _Dm_fbz_nso = nk·nso² (vs 3·nk·nao² before) — +33% on this buffer. Ar test: ~40 KB extra. GW X2C still uses host copy_Gk_2c; wiring it needs cugw_qkpt per-ss GEMM adjustments (lda=nso + offsets) — separate commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
G(k) = U_k G(ik) U_k†in the X2C GPU HF and GW kernels.get_dm_fbzandcopy_Gk_2cpreviously handled time-reversal via a hardcoded spin-flip without applying the spatial rotationU_k; both now usevalue_AOuniformly, removing all manualtr_conjbranching.check_x2c_hubbard_symmetrytests for HF and GW usingcustom_hf_kernel/custom_gw_kernel, verifying Sigma consistency across symmetry cases at tolerance 1e-8 on the same Hubbard+Rashba test data.Follows up on Green-Phys/green-mbpt#50