Skip to content

AMGX 2.5.0: FGMRES/BiCGStab stagnates near 3e-3 residual on large real general system; many rows have near-zero row-sum; some AMG configs produce NaN residual #361

@Guo150-3

Description

@Guo150-3

Summary
I am solving a large sparse linear system on GPU using NVIDIA AMGX 2.5.0 (via pyamgx).
The system is MatrixMarket "coordinate real general" with AMGX extension "%%AMGX 1 1 rhs".
AMGX runs and returns a finite solution vector, but the solver does not converge: residual stagnates around ~2.9e-3 even after 2000 iterations.
Some AMG configurations also produce NaN/-NaN residual starting at iteration 0.

I would like guidance on:

  1. Whether the matrix appears near-singular / has a nullspace (constraint missing), and what AMGX recommends in that case.
  2. Recommended AMGX configurations for large "real general" matrices that avoid NaN residual and improve convergence (smoother/coarse solver/strength settings).
  3. Whether scaling/normalization is needed before/within AMGX.

System / Matrix
File header:
%%MatrixMarket matrix coordinate real general
%%AMGX 1 1 rhs

Size (from file scan):

  • N = 2,466,160 (square)
  • nnz = 85,387,200
  • avg nnz/row ≈ 34.62
  • row nnz min/max: 25 / 35

Diagonal/value checks (scan over COO entries):

  • missing diagonal rows: 0 (0.000000%)
  • diag |min|: 3.36151e-10
  • diag |max|: 2.38569e-4
  • diag near0(<1e-30): 0
  • value NaN count: 0, Inf count: 0
  • grep check: no "nan/inf" tokens in the file

Row-sum / scaling stats (scan over COO entries):

  • |row_sum| abs min/mean/max:
    2.981e-13 / 2.598e-06 / 3.164e-4
  • fraction(|row_sum| < 1e-12) = 0.000001
  • fraction(|row_sum| < 1e-10) = 0.000101
  • fraction(|row_sum| < 1e-08) = 0.043440
  • fraction(|row_sum| < 1e-06) = 0.594001

Diagonal dominance proxy:
|diag|/rowL1 min/mean/max:
7.226e-05 / 0.2668 / 0.9420
quantiles |diag|/rowL1 [0,1%,10%,50%,90%,99%,100%]:
[7.226e-05, 3.098e-02, 4.379e-02, 2.904e-01, 4.384e-01, 5.300e-01, 9.420e-01]

Environment
OS:
Linux feng 6.8.0-87-generic #88-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 11 09:28:41 UTC 2025 x86_64

GPUs:
3x NVIDIA RTX A5000

Driver/CUDA:

  • NVIDIA driver 535.274.02
  • CUDA driver version 12.2
  • AMGX compiled with CUDA Runtime 12.5, using CUDA driver 12.2

AMGX:
AMGX version 2.5.0 (Built on Sep 28 2025, 14:00:05)

Python / wrapper:

  • conda python 3.11.14
  • numpy 2.3.5, scipy 1.16.3
  • pyamgx commit: 6229ff008ee5a264cfc1799eeb2f83d96da0aa dc

The saved x_solution.npy is finite (no NaN/Inf), min/max approx -5.02 / 7.17.

Results
Case 1: BiCGStab + NOSOLVER

  • max_iters=500, tolerance=1e-8, convergence=RELATIVE_INI
  • residual does not change:
    Ini 2.992957e-03 -> Final 2.992957e-03 (rate=1.0)

Case 2: FGMRES + AMG (PMIS/D2, JACOBI_L1 smoother)

  • max_iters=2000, tolerance=1e-6, convergence=RELATIVE_INI, presweeps/postsweeps=3/3
  • residual stagnates:
    Ini 2.992957e-03 -> Final 2.919952e-03
    Total reduction (final/ini): 0.9756
    Avg rate ~1.0000

Case 3: other AMG configs

  • residual becomes NaN/-NaN at iter 0

Attachments

  • solve_gpu_pyamgx.py (small script)
  • JSON configs for the above cases
  • logs (AMG Grid + first ~10 iters + last ~20 lines)
  • diag/row-sum scan outputs above

I cannot upload the full matrix due to size; I can provide a smaller extracted test case if you recommend a useful extraction strategy.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions