Skip to content

Add QR factorization (augmented sparse QR) for SparseWithDenseRowColMatrix#4

Closed
ChrisRackauckas-Claude wants to merge 1 commit into
mainfrom
add-qr-factorization
Closed

Add QR factorization (augmented sparse QR) for SparseWithDenseRowColMatrix#4
ChrisRackauckas-Claude wants to merge 1 commit into
mainfrom
add-qr-factorization

Conversation

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Collaborator

Add a QR factorization (qr) for SparseWithDenseRowColMatrix

Please ignore until reviewed by @ChrisRackauckas. Draft.

Adds a QR factorization mirroring the existing LU/Woodbury path, as requested. qr(A) builds a single sparse QR (SuiteSparseQR) of the bordered system [S U; V -I] — the numerically stable analogue of the existing augmented-LU path — returning a new SparseWithDenseRowColQRAugmented. The previous qr stub (which threw "not provided") is replaced.

F = qr(A)                 # augmented sparse QR of [S U; V -I]
F \ b;  ldiv!(F, b)       # solve; also F' \ b, transpose(F) \ b, matrix RHS
refactor!(F, A2)          # qr!(F, A2) alias — full re-qr (no symbolic reuse; see below)
solve(LinearProblem(A,b), SparseWithDenseRowColQRFactorization())   # LinearSolve, opt-in

Design decision: augmented-QR only — no Woodbury-over-qr(S)

A design workflow plus direct measurement both concluded decisively: ship only the augmented QR. A Woodbury-over-qr(S) mode shares the same κ(S)·κ(C) cancellation as the LU Woodbury path — forming S⁻¹ is what loses the accuracy, and that is independent of whether S⁻¹ comes from LU or QR. Measured forward error on A = S + U V with well-conditioned A (κ(A) ≈ 6) as κ(S) sweeps to 3.6e14:

κ(S) qr (augmented) Woodbury-qr(S) (rejected) factorize :woodbury refine=2
3.6e6 … 3.6e14 3.1e-16 (flat) up to ~1e-1 8.7e-17 → 4.5e-7

So qr(A; strategy = :woodbury) throws; :auto/:augmented are the only modes.

Honest scope — what QR does and does not buy you

This is the part worth scrutinizing. Augmented-QR is not more accurate than the existing augmented-LU path:

  • On well-conditioned A, both sit at the noise floor (~1e-13/1e-16); the win is only over the Woodbury fast path.

  • On ill-conditioned assembled A, augmented-QR is backward-stable (forward error ~κ(A)·eps), and the augmented-LU path's partial pivoting is often more accurate on these structured matrices — measured orders apart for κ(A) ≳ 1e6.

  • It is slower on every axis (banded interior + r dense rows, factor/solve/refactor ms):

      n=2000 r=4                          n=5000 r=8
      qr  (augmented QR) : 67 / 7.5 / 90    813 / 156 / 776
      lu  (augmented LU) : 47 / 1.7 / 21    633 /  11 / 188     (klu! symbolic reuse on refactor)
      lu  (Woodbury)     : 1.3/ 0.1 / n/a   3.6 / 0.3 / n/a     (fast path, ~50–60× the QR)
    

QR's genuine, distinct value is being rank-revealing and providing the QR interface that users of almost-banded solvers (FastAlmostBandedMatrices) expect — not stability-over-LU and not throughput. The README states this plainly so nobody reaches for qr expecting it to be the accurate or fast option. If we want a feature that augmented-LU/Woodbury genuinely cannot do (least-squares for rank-deficient A), that is not a clean win via this bordered system — its least-squares solution is not the least-squares solution of A x = b — so the path throws SingularException on singular A, matching the LU contract. Happy to revisit if you want least-squares semantics done properly.

Optimization steps carried over (and the ones that don't)

LU-path optimization QR? Handling
Adjoint/transpose solve Schur identity Mᴴ ↔ Aᴴ; SuiteSparseQR has no Qaug' \ b, so a separate qr(Maugᴴ) is built and lazily cached (QaugH), invalidated on refactor!.
Complex RHS over a real factorization real/imag split, same as the LU path.
r == 0 degenerate case reduces to qr(S).
LinearSolve caching interface SparseWithDenseRowColQRFactorization() (opt-in; default stays the LU algorithm). Singular AReturnCode.Infeasible, KLU parity.
Symbolic-reuse refactor! SuiteSparseQR exposes no qr!; refactor!/qr! re-factor fully. Documented; hot loops should use lu.
Zero-allocation solve no in-place ldiv!(::QRSparse, b); one allocation/solve.
Generic eltypes Float64/ComplexF64 only — see below.

Two bugs found in adversarial review and fixed (with regression tests)

  1. Spurious SingularException on nonsingular, ill-conditioned A. SuiteSparseQR's default tol = -2 deflates ill-conditioned-but-nonsingular columns and writes an exact 0 onto the R-diagonal, so qr(A) threw on matrices factorize/klu solve fine (reproduced at cond(A) = 1e12). Fixed by passing tol = 0.0 to all qr calls and making the rank decision ourselves; genuine singularity (exact-zero pivot) is still caught. Regression test added.
  2. Float32/ComplexF32 sparse qr throws CHOLMODException on Julia 1.11 (works on 1.10/1.12 — single-precision sparse QR is version-dependent). Rather than ship behavior that differs across Julia versions, the QR path is restricted to Float64/ComplexF64, which work identically everywhere; Float32/generic eltypes route to factorize/lu.

Tests / CI

  • New test/test_qr.jl (70 assertions): correctness vs BigFloat reference, the κ(S) stability sweep, the ill-conditioned-assembled-A no-spurious-throw regression, eltype rejection, adjoint/transpose with lazy-cache checks, matrix RHS, refactor!/qr!, singularity, update_lowrank! rejection, r==0, and the no-Woodbury-QR guard. Plus LinearSolve-QR tests.
  • 297/297 pass locally on Julia 1.10, 1.11, and 1.12.
  • One unrelated flaky Aqua.test_persistent_tasks timeout on loaded 1.12 hosts (the package spawns no load-time task — verified no @async/Threads.@spawn/Timer/__init__; the load subprocess exits in ~0.4s) is addressed by raising Aqua's wall-clock tmax to 60s, with a comment explaining it relaxes an over-tight timeout, not a real defect.

Runic-formatted. No new dependencies.

🤖 Generated with Claude Code

Co-Authored-By: Chris Rackauckas accounts@chrisrackauckas.com

…atrix

`qr(A)` factors the bordered system [S U; V -I] with SuiteSparseQR — the
numerically stable analogue of the augmented-LU path — returning a new
`SparseWithDenseRowColQRAugmented`. Replaces the throwing `qr` stub.

Design: ship augmented-QR only. A Woodbury-over-qr(S) mode shares the
κ(S)·κ(C) cancellation of the LU Woodbury path (forming S⁻¹ is what loses
accuracy, independent of LU vs QR) and is catastrophically inaccurate on
ill-conditioned S, so `strategy = :woodbury` throws.

Carries over: adjoint/transpose (lazily-cached separate qr of the adjoint,
since SuiteSparseQR has no Qaug' \ b), complex-RHS-over-real split, r==0,
and a LinearSolve algorithm (SparseWithDenseRowColQRFactorization, opt-in;
default stays LU). Does not carry over (SuiteSparseQR limits, documented):
symbolic-reuse refactor (qr!/refactor! re-factor fully), zero-allocation
solve, and generic eltypes.

Honest scope: augmented-QR is not more accurate than augmented-LU (LU can
be better on ill-conditioned assembled A) and is slower on every axis; its
value is being rank-revealing and the expected QR interface. README states
this plainly.

Two bugs found in adversarial review and fixed with regression tests:
- Spurious SingularException on nonsingular ill-conditioned A: SuiteSparseQR's
  default tol=-2 deflation zeroed R-diagonal pivots; pass tol=0.0 and make
  the rank decision ourselves (genuine singularity still caught).
- Float32/ComplexF32 sparse qr throws CHOLMODException on Julia 1.11 (works
  1.10/1.12): restrict the QR path to Float64/ComplexF64 for consistent
  cross-version behavior; Float32/generic route to factorize/lu.

Raise Aqua persistent_tasks tmax to 60s: no load-time task is spawned
(verified), the default 10s wall-clock budget is flaky on loaded CI hosts.

297/297 tests pass on Julia 1.10, 1.11, and 1.12. Runic-formatted. No new deps.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas-Claude
Copy link
Copy Markdown
Collaborator Author

Superseded by #6, which combines the QR factorization and the lstsq least-squares work into one PR off main (and rewrites the QR backend onto SparseColumnPivotedQR: allocation-free solve, symbolic-reuse refactor, rank-revealing, generic eltypes). Closing in favor of #6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants