AMGX 2.5.0: FGMRES/BiCGStab stagnates near 3e-3 residual on large real general system; many rows have near-zero row-sum; some AMG configs produce NaN residual

Summary
I am solving a large sparse linear system on GPU using NVIDIA AMGX 2.5.0 (via pyamgx).
The system is MatrixMarket "coordinate real general" with AMGX extension "%%AMGX 1 1 rhs".
AMGX runs and returns a finite solution vector, but the solver does not converge: residual stagnates around ~2.9e-3 even after 2000 iterations.
Some AMG configurations also produce NaN/-NaN residual starting at iteration 0.

I would like guidance on:
1) Whether the matrix appears near-singular / has a nullspace (constraint missing), and what AMGX recommends in that case.
2) Recommended AMGX configurations for large "real general" matrices that avoid NaN residual and improve convergence (smoother/coarse solver/strength settings).
3) Whether scaling/normalization is needed before/within AMGX.

System / Matrix
File header:
%%MatrixMarket matrix coordinate real general
%%AMGX 1 1 rhs

Size (from file scan):
- N = 2,466,160 (square)
- nnz = 85,387,200
- avg nnz/row ≈ 34.62
- row nnz min/max: 25 / 35

Diagonal/value checks (scan over COO entries):
- missing diagonal rows: 0 (0.000000%)
- diag |min|: 3.36151e-10
- diag |max|: 2.38569e-4
- diag near0(<1e-30): 0
- value NaN count: 0, Inf count: 0
- grep check: no "nan/inf" tokens in the file

Row-sum / scaling stats (scan over COO entries):
- |row_sum| abs min/mean/max:
  2.981e-13 / 2.598e-06 / 3.164e-4
- fraction(|row_sum| < 1e-12) = 0.000001
- fraction(|row_sum| < 1e-10) = 0.000101
- fraction(|row_sum| < 1e-08) = 0.043440
- fraction(|row_sum| < 1e-06) = 0.594001

Diagonal dominance proxy:
|diag|/rowL1 min/mean/max:
7.226e-05 / 0.2668 / 0.9420
quantiles |diag|/rowL1 [0,1%,10%,50%,90%,99%,100%]:
[7.226e-05, 3.098e-02, 4.379e-02, 2.904e-01, 4.384e-01, 5.300e-01, 9.420e-01]

Environment
OS:
Linux feng 6.8.0-87-generic #88-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 11 09:28:41 UTC 2025 x86_64

GPUs:
3x NVIDIA RTX A5000

Driver/CUDA:
- NVIDIA driver 535.274.02
- CUDA driver version 12.2
- AMGX compiled with CUDA Runtime 12.5, using CUDA driver 12.2

AMGX:
AMGX version 2.5.0 (Built on Sep 28 2025, 14:00:05)

Python / wrapper:
- conda python 3.11.14
- numpy 2.3.5, scipy 1.16.3
- pyamgx commit: 6229ff008ee5a264cfc1799eeb2f83d96da0aa dc

The saved x_solution.npy is finite (no NaN/Inf), min/max approx -5.02 / 7.17.

Results
Case 1: BiCGStab + NOSOLVER
- max_iters=500, tolerance=1e-8, convergence=RELATIVE_INI
- residual does not change:
  Ini 2.992957e-03 -> Final 2.992957e-03 (rate=1.0)

Case 2: FGMRES + AMG (PMIS/D2, JACOBI_L1 smoother)
- max_iters=2000, tolerance=1e-6, convergence=RELATIVE_INI, presweeps/postsweeps=3/3
- residual stagnates:
  Ini 2.992957e-03 -> Final 2.919952e-03
  Total reduction (final/ini): 0.9756
  Avg rate ~1.0000

Case 3: other AMG configs
- residual becomes NaN/-NaN at iter 0

Attachments
- solve_gpu_pyamgx.py (small script)
- JSON configs for the above cases
- logs (AMG Grid + first ~10 iters + last ~20 lines)
- diag/row-sum scan outputs above

I cannot upload the full matrix due to size; I can provide a smaller extracted test case if you recommend a useful extraction strategy.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMGX 2.5.0: FGMRES/BiCGStab stagnates near 3e-3 residual on large real general system; many rows have near-zero row-sum; some AMG configs produce NaN residual #361

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AMGX 2.5.0: FGMRES/BiCGStab stagnates near 3e-3 residual on large real general system; many rows have near-zero row-sum; some AMG configs produce NaN residual #361

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions