Skip to content

Apply space-group symmetry transform in X2C GW and HF kernels #14

Open
gauravharsha wants to merge 15 commits into
mainfrom
symm-x2c
Open

Apply space-group symmetry transform in X2C GW and HF kernels #14
gauravharsha wants to merge 15 commits into
mainfrom
symm-x2c

Conversation

@gauravharsha
Copy link
Copy Markdown
Contributor

  • Applies the full space-group symmetry transform G(k) = U_k G(ik) U_k† in the X2C GPU HF and GW kernels. get_dm_fbz and copy_Gk_2c previously handled time-reversal via a hardcoded spin-flip without applying the spatial rotation U_k; both now use value_AO uniformly, removing all manual tr_conj branching.
  • Adds check_x2c_hubbard_symmetry tests for HF and GW using custom_hf_kernel / custom_gw_kernel, verifying Sigma consistency across symmetry cases at tolerance 1e-8 on the same Hubbard+Rashba test data.

Follows up on Green-Phys/green-mbpt#50

gauravharsha and others added 2 commits May 18, 2026 21:17
Use value_AO in copy_Gk_2c and get_dm_fbz; restore off-diagonal
Madelung in add_Ewald.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
check_x2c_hubbard_symmetry mirrors the green-mbpt test: verifies that
one SCF step with space-group or TR-only symmetry gives the same Sigma
at every IBZ k-point as the no-symmetry full-BZ run (tol=1e-8).
s-type orbitals only; SU(2) spinor transform exercised but not
higher angular-momentum orbital representations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Applies full space-group (including time-reversal) AO-basis symmetry reconstruction G(k_full) = U_k G(k_ibz) U_k† in the X2C GPU HF/GW paths, and adds regression tests to ensure self-energies are symmetry-consistent across “no symmetry”, “TR-only”, and full space-group symmetry inputs.

Changes:

  • X2C HF: reconstruct full-BZ density matrix blocks via k_symmetry().value_AO(...) rather than manual TR-only spin-flip branching.
  • X2C GW: reconstruct full-BZ Green’s function blocks via k_symmetry().value_AO(...) for both double and single precision overloads.
  • Tests: add Hubbard+Rashba symmetry-consistency checks for both HF and GW kernels.

Reviewed changes

Copilot reviewed 3 out of 13 changed files in this pull request and generated 2 comments.

File Description
test/cu_solver_test.cpp Adds new Hubbard+Rashba symmetry consistency test sections for HF and GW.
src/hf_gpu_kernel.cpp Updates X2C HF density-matrix reconstruction to use value_AO for full symmetry transforms.
src/gw_gpu_kernel.cpp Updates X2C GW copy_Gk_2c to use value_AO for full symmetry transforms (double + float).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/cu_solver_test.cpp
Comment on lines +164 to +172
if (scf_type == "GW") {
green::grids::transformer_t ft(p);
auto [kernel, solver] = green::gpu::custom_gw_kernel(true, p, nao, nso, ns, NQ, ft, bz, Sk);
solver(G_shared, S_shared);
result.resize(nts, ns, ink, nso, nso);
S_shared.fence();
if (!green::utils::context().node_rank) result << S_shared.object();
S_shared.fence();
} else {
Comment thread test/cu_solver_test.cpp Outdated
Comment on lines +107 to +112
const std::string dir = TEST_PATH + "/GW_X2C_Hubbard"s;
const std::string df_path = dir + (scf_type == "GW" ? "/df_int"s : "/df_hf_int"s);
const std::string grid_file = GRID_PATH + "/ir/1e4.h5"s;
constexpr size_t ns = 1, nk = 36, nso = 4, nao = 2;
constexpr double tol = 1e-8;

gauravharsha and others added 13 commits May 18, 2026 22:10
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ference

Mirror of the green-mbpt change: the X2C symmetry consistency check now
uses a real cubic Ar (def2-svp, --x2c 2, 3x3x1 k-mesh) calculation in
place of the Hubbard+Rashba model. This exercises both the orbital
point-group representation (p-shells in the basis) and the SU(2) spinor
part of the double-group transform — which the s-orbital Hubbard model
could not reach. nk=9; ink per run no_symm=9, trs_only=5, full_symm=3.

The reference G_tau is extracted from iter1/G_tau/data into a minimal
data_<tag>.h5 file (G_tau only); Sigma1 and Sigma_tau are recomputed
and compared across the three symmetry modes inside the test. Single
df_hf_int directory shared between HF and GW paths. Tolerance set to
1e-5 to accommodate the integral-storage-floor residual between
symmetry-reduced and no-symmetry runs.
cu_symmetry::initialize and cugw_utils ctor now take nso alongside nao.
gw_gpu_kernel forwards _nso at both scalar and X2C call sites; the
k_sym_transform_ao buffer in make_cu_symmetry_data is sized nso×nso
(no-op for scalar where nso==nao; X2C still passes build_k_ao=false so
the buffer stays empty). Memory accounting in optimize_ntbatch
verified consistent — no behavioral change.
In check_x2c_ar_symmetry, when any element of (i, t) Sigma slice exceeds
tol, print aa/bb/ab/ba block max diffs with element location, plus the
overall max and a label identifying which sub-check is firing
(full_symm vs no_symm or trs_only vs no_symm). Lets the failing pattern
narrow down whether the bug is global (all blocks similar diff), in
one spinor block (e.g. ab handling), or in the Hermiticity derivation
(ba != ab^dagger).
compute_second_tau_contraction_2C previously applied U_q with hardcoded
OP_N (step 2a) and OP_C (step 2c), then conjugated the result Y2 via
RSCAL when q_conj_after_uq=true. Conjugating Y2 elementwise also
conjugates the Y1 = V*G contribution baked into it, which double-counts
the TR action — Y1 already carries the correct TR convention from
upstream (copy_Gk_2c on the CPU side).

Mirror the scalar compute_second_tau_contraction logic instead: flip
OP_N/OP_C in 2a/2c based on q_conj_after_uq. This gives the correct
W(q_deg) = U_q^* * P * U_q^T * Y1 for TR-related q in the auxiliary
basis without touching Y1.

Non-TR path unchanged (OP_N/OP_C unchanged). Only trs_only and
full_symm runs (which include TR-related q_deg in q-stars) are affected;
no_symm runs (every q is its own IBZ rep) take the q_conj_after_uq=false
branch and were already correct.
transform_k_ao_device_2c had its own block-by-block aa↔bb / ab/ba logic that
duplicated (and got wrong) what the nso×nso k_sym_transform_ao matrices from
the input file already encode via σ_y spinor mixing. Delete it and route the
X2C path through the same transform_k_ao_device that scalar uses — the only
difference is the matrix dim (nao vs. nso), inferred from k_ao_transforms.size().

- cu_symmetry::initialize now always uploads k_ao_transforms (was scalar-only)
  and sizes matrix_stride_ from k_transform_dim_, so X2C scratch grows to
  nts*nso² automatically.
- transform_k_ao_device_impl uses `dim = k_transform_dim_` for both GEMMs,
  and the existing single-RSCAL TR branch conjugates the full U·G·U† output
  (correct for X2C because the σ_y row/col swap already happened inside the
  GEMMs).
- gw_gpu_kernel.cpp:472 sets build_k_ao=true for the X2C path so the nso×nso
  U matrices reach cu_symmetry.

Wiring the X2C GW/HF paths to actually call transform_k_ao_device (replacing
host-side copy_Gk_2c / get_dm_fbz) is left for a follow-up: the GPU function
expects contiguous nso×nso layout per (k, t), but the existing X2C downstream
buffers are 4-block (4, nts, nao, nao). That refactor touches the qkpt buffer
pool and the per-spin-block GEMMs in compute_first/second_tau_contraction_2C.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dm_ibz (ink, nso, nso) on device → per-k_full transform_k_ao_device produces
nso×nso dm_fbz on device → cuhf_utils picks aa/bb/ab sub-views at lda=nso.
Drops the old host get_dm_fbz path (value_AO + Eigen-Map ndarray copy + upload).

cuhf_utils: new constructor takes a device (nk, nso, nso) dm_fbz pointer and
stores _Dm_fbz_nso; add_exchange_to_fock branches on it for per-ss GEMM with
(row_off, col_off) quadrant offsets and lda=nso. Legacy 3-block path
(_Dm_fbz_sk2ba, lda=nao) preserved for scalar HF.

make_cu_symmetry_data is now externally linkable; HF calls with build_q_p0=false.

Memory cost: _Dm_fbz_nso = nk·nso² (vs 3·nk·nao² before) — +33% on this buffer.
Ar test: ~40 KB extra.

GW X2C still uses host copy_Gk_2c; wiring it needs cugw_qkpt per-ss GEMM
adjustments (lda=nso + offsets) — separate commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants