Skip to content

QR via SparseColumnPivotedQR + minimum-norm least squares (lstsq)#6

Merged
ChrisRackauckas merged 2 commits into
mainfrom
qr-scpqr-and-lstsq
May 31, 2026
Merged

QR via SparseColumnPivotedQR + minimum-norm least squares (lstsq)#6
ChrisRackauckas merged 2 commits into
mainfrom
qr-scpqr-and-lstsq

Conversation

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Collaborator

Stable QR (SparseColumnPivotedQR backend) + minimum-norm least squares (lstsq)

Combines and supersedes #4 (QR) and #5 (lstsq) into one PR off main, as requested. Ignore until reviewed by @ChrisRackauckas. Draft.

Two new capabilities for SparseWithDenseRowColMatrix (A = S + U V), plus the structured matvec they rely on. Everything was verified against a pinv/BigFloat oracle before implementation; 620/620 tests pass on Julia 1.10, 1.11, and 1.12.

qr(A) — stable, rank-revealing QR (now on SparseColumnPivotedQR)

Factors the bordered system [S U; V -I] with SparseColumnPivotedQR — pure-Julia, rank-revealing, column-pivoted (the QR analogue of PureKLU). The numerically stable analogue of the augmented-LU path (never forms S⁻¹); on A = S+UV with κ(A)≈6 and κ(S) swept to 3.6e14 it holds at ~3e-16 while the Woodbury path degrades to ~4.5e-7.

The backend gives the QR path the engineering niceties it lacked under SuiteSparseQR:

SuiteSparseQR (prior) SparseColumnPivotedQR (now)
solve 24 MB/solve 0 B, allocation-free ldiv!
refactor! full re-factor symbolic-reuse (Newton-loop usable)
rank detection fill-reducing perm (needed a tol=0 workaround) genuine column-pivoted rank
eltypes Float64/ComplexF64; throws on 1.11 for Float32 generic — Float32, BigFloat (to 9.9e-77), Dual, all versions

No strategy=:woodbury (Woodbury-over-qr(S) shares the κ(S)·κ(C) cancellation; ~1e-1 error).

lstsq(A, b) — minimum-norm least squares A⁺b

For rank-deficient / inconsistent systems where \/factorize/qr throw. alg=:auto (default) picks a structure-exploiting direct solve when applicable, else the exact dense COD:

  • :structured — when S is nonsingular, A = S(I+ZV) and the entire rank deficiency collapses into the r×r capacitance C = I + V S⁻¹U (nullity(A)=nullity(C)). A⁺b from one PureKLU factorization of S + small dense SVD/QRA is never densified (O(factor(S)+n·r²), >100× faster than dense at n=2000; handles consistent and inconsistent b). A selector-singular S (BVP boundary case) is peeled to a sparse nonsingular S̃ = S + UUᴴ.
  • :dense — LAPACK gelsy COD, exact; the fallback for a general singular S (whose null-space correction is dense — a genuine limit, not a gap).
  • :iterative — LSQR/LSMR via the structured matvec/adjoint (IterativeSolvers extension), for large n.

Also adds a structured adjoint/transpose matvec (Aᴴu = Sᴴu + Vᴴ(Uᴴu)) to the core, fixing a ~1000× slower generic getindex fallback for A'*u.

Notes for review

  • New dependencies: SparseColumnPivotedQR + SparseMatricesCSR (QR backend), IterativeSolvers (weakdep, lstsq iterative engine). SparseColumnPivotedQR and PureKLU are dev'd in CI until they land in General (you mentioned both are being registered — then the CI Pkg.develop step can drop).
  • Corrections made honestly along the way (each measured, not assumed): a design draft's "Krylov lsmr is catastrophically broken" was wrong (it's ~6 orders looser, not broken); I over-claimed refactor! allocates nothing (the kernel does; my wrapper rebuilds the bordered matrix — fixed the docstring); and the singular-S BVP case does have a structure-exploiting direct method (the peel), contrary to an earlier "falls back to dense" statement.
  • Adversarial review caught and fixed three real lstsq bugs (lsqr false-convergence, negative-λ, eltype guard hole), all regression-tested.

Runic-formatted, typos clean.

🤖 Generated with Claude Code

Co-Authored-By: Chris Rackauckas accounts@chrisrackauckas.com

Two new capabilities for SparseWithDenseRowColMatrix (A = S + U V), plus the
structured matvec they need.

QR — `qr(A)` factors the bordered system [S U; V -I] with SparseColumnPivotedQR
(pure-Julia, rank-revealing, column-pivoted; the QR analogue of PureKLU). It is
the numerically stable analogue of the augmented-LU path (never forms S⁻¹), and
the backend gives it an allocation-free solve, a symbolic-reuse `refactor!`/`qr!`
(usable in Newton/time-stepping loops), genuine rank-revealing singularity
detection, and generic element types (BLAS floats plus BigFloat/Dual, on every
Julia version — no SuiteSparseQR single-precision/1.11 issues). No
strategy=:woodbury (Woodbury-over-qr(S) shares the κ(S)·κ(C) cancellation and is
catastrophically inaccurate on ill-conditioned S). Verified vs BigFloat/pinv.

lstsq — `lstsq(A, b)` returns the minimum-norm least-squares solution A⁺b for
rank-deficient/inconsistent systems where \\/factorize/qr throw. Default
alg=:auto picks a structure-exploiting DIRECT solve when applicable, else the
exact dense COD:
  :structured — S nonsingular: A = S(I+ZV), rank deficiency collapses into the
    r×r capacitance C = I + V S⁻¹U; A⁺b from one PureKLU factorization of S plus
    small dense SVD/QR, never densifying A (O(factor(S)+n·r²), >100× faster than
    dense; handles consistent and inconsistent b). A selector-singular S (BVP
    boundary case) is peeled to a sparse nonsingular S̃ = S + U Uᴴ.
  :dense — LAPACK gelsy COD, exact; the fallback for a general singular S.
  :iterative — LSQR/LSMR via the structured matvec/adjoint (IterativeSolvers
    extension), for large n.
Every formula was verified against pinv(Matrix(A))*b before implementation.

Also adds a structured adjoint/transpose matvec (Aᴴu = Sᴴu + Vᴴ(Uᴴu)) to the
core, fixing the ~1000× slower generic getindex fallback for A'*u.

LinearSolve integration: SparseWithDenseRowColFactorization (default) and an
opt-in SparseWithDenseRowColQRFactorization. New deps: SparseColumnPivotedQR +
SparseMatricesCSR (QR backend), IterativeSolvers (weakdep, lstsq iterative).

620/620 tests pass on Julia 1.10, 1.11, and 1.12. Runic-formatted, typos clean.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…eated solves

Adds a reusable structured least-squares factorization so a Newton / time-stepping loop
(many solves, same A) does the expensive work once. `SparseWithDenseRowColLeastSquares(A)`
caches S's PureKLU factorization and the small dense SVD/QR decompositions; each `F \ b` /
`ldiv!(F, b)` is then a cheap back-solve that reuses them.

Measured at n=2000, r=6: a reused solve is 0.05 ms / 16 KB vs the one-shot lstsq(A, b) at
2.1 ms / 2.6 MB — ~44× faster and ~160× less allocation, since it skips re-factoring S.

The one-shot `lstsq(A, b)` is refactored to share the same setup/apply path (its post-hoc
LS-optimality guard is preserved), so its behavior is unchanged. The constructor errors when
the structured method does not apply (general singular S) — use lstsq(A, b; alg=:dense).

639/639 tests pass on Julia 1.10 and 1.11 (1.12 in progress; CI covers all three). Runic, typos clean.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas ChrisRackauckas marked this pull request as ready for review May 31, 2026 15:23
@ChrisRackauckas ChrisRackauckas merged commit 0c24446 into main May 31, 2026
11 checks passed
@ChrisRackauckas ChrisRackauckas deleted the qr-scpqr-and-lstsq branch May 31, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants