Add lstsq: minimum-norm least squares (A⁺b) for rank-deficient / inconsistent systems by ChrisRackauckas-Claude · Pull Request #5 · SciML/SparseWithDenseRowColMatrices.jl

ChrisRackauckas-Claude · 2026-05-30T23:44:04Z

Add `lstsq`: minimum-norm least squares (`A⁺b`) for rank-deficient / inconsistent systems

Stacked on #4 (the QR PR) — please review #4 first; this branch is based on it. Ignore until reviewed by @ChrisRackauckas. Draft.

\, factorize/lu, and qr solve nonsingular systems and throw SingularException on a singular A. This adds lstsq(A, b) — the minimum-norm least-squares solution x = A⁺b (the Moore–Penrose solution: of all x minimizing ‖Ax−b‖₂, the one of smallest ‖x‖₂) — correct for rank-deficient and inconsistent systems. It is a separate, opt-in entry point; the direct solves are unchanged and keep throwing.

x = lstsq(A, b)                     # exact, dense COD (default)
x = lstsq(A, b; alg = :iterative)   # structure-exploiting LSQR/LSMR (needs `using IterativeSolvers`)
x = lstsq(A, b; λ = 1e-3)           # Tikhonov-damped (ridge) variant

Why not the bordered/Woodbury machinery

The prior QR work established (and #4's discussion shows) that solving the bordered system [S U; V -I][x;y]=[b;0] in least squares minimizes a different objective than ‖Ax−b‖ — measured residual 3–65× too large on rank-deficient inconsistent systems — and there's no Woodbury formula for the pseudoinverse of a sum. So lstsq works on A directly.

Two engines

	engine	exactness	cost	when
`:dense` (default)	LAPACK `gelsy` (complete-orthogonal decomposition) of the densified `A`	exact — matches `pinv(Matrix(A))*b` to ~1e-15	`O(n³)` / `O(n²)` mem	up to `n` ~ few thousand
`:iterative`	LSQR/LSMR via `A`'s structured matvec + adjoint, never forming dense `A`	min-norm to `atol`/`btol` (~1e-12); fragile if ill-conditioned	per-iter `O(nnz(S)+nr)`	large `n`

A = S + U·V is always fully dense (the rank-r outer product fills every entry), so a direct solve genuinely can't exploit the sparsity — the iterative path is the structure-exploiting one for large n.

Structured adjoint matvec (a real fix, beyond LS)

Added mul!(y, A', u) / Aᴴ / Aᵀ to core computing Aᴴu = Sᴴu + Vᴴ(Uᴴu). Previously A'*u fell back to LinearAlgebra's generic getindex path — ~1000× slower (it materializes columns of A). This is what lets the iterative engine drive A directly, with no LinearMaps/LinearOperators dependency.

Engine choice: IterativeSolvers, not Krylov / `KrylovJL_LSMR`

I measured this directly rather than route through LinearSolve's KrylovJL_* (which would need no new dep). With a correct adjoint, both libraries converge — but IterativeSolvers lsqr/lsmr reach ~1e-12 min-norm accuracy while Krylov lsmr/lsqr halt early at ~1e-6 (tightening atol/rtol doesn't help — Krylov stops on a different internal criterion). So Krylov is not broken (an internal review draft had claimed a catastrophic 1.92 error — that was an adjoint-wrapper artifact, corrected), but IterativeSolvers is ~6 orders tighter, which is the right call for a least-squares primitive. IterativeSolvers is a light weakdep behind an extension; the LinearSolve extension is untouched.

Adversarial review found and fixed

lsqr silent wrong answers — IterativeSolvers' isconverged is unreliable for lsqr (reports success even at the iteration cap with a garbage result; reproduced err=6.4 at maxiter=3). Now flag on budget exhaustion (iters ≥ maxiters, which cleanly separates converged/not for both solvers) and report the true LS residual ‖Aᴴ(Ax−b)‖.
Negative λ silently treated as |λ| by the dense path (rejected by iterative) → both now validate λ ≥ 0.
Eltype guard hole — Int/Rational slipped through because the check ran on the float-promoted type → now checks the raw promote_type.

Regression tests added for each. The review independently confirmed correctness across higher rank deficiency (n−3/n−4), r==0, SelectorMatrix-U, complex (adjoint ≠ transpose), mixed precision, Tikhonov, and the 5-arg adjoint mul! — all matching the pinv oracle.

Tests / CI

New test/test_lstsq.jl (~90 assertions): adjoint/transpose matvec; dense & iterative min-norm vs pinv (consistent + inconsistent, LS-optimality + null-space-free certificates); SelectorMatrix-U; full-rank reduces to \; complex; Tikhonov; the "lstsq succeeds where \/factorize/qr throw" separation; eltype/arg/λ/convergence-warning guards.
390/390 pass locally on Julia 1.10, 1.11, and 1.12.
Runic-formatted, typos clean. New dependency: IterativeSolvers (weakdep, extension-gated — core load unchanged).

🤖 Generated with Claude Code

Co-Authored-By: Chris Rackauckas accounts@chrisrackauckas.com

…nsistent systems `lstsq(A, b)` returns the Moore–Penrose minimum-norm least-squares solution, correct for rank-deficient or inconsistent A where `\`/`factorize`/`qr` throw SingularException. A separate, opt-in entry point — the direct solves stay exact and keep throwing. Why not the bordered/Woodbury path: solving [S U; V -I][x;y]=[b;0] in least squares minimizes a different objective than ‖Ax−b‖ (measured residual 3–65× too large on rank-deficient inconsistent systems), and there is no Woodbury formula for the pseudoinverse of a sum. So this works on A directly. Two engines: - :dense (default) — LAPACK gelsy (complete-orthogonal decomposition) of the densified A. Exact (matches pinv(Matrix(A))*b to machine precision), O(n³)/O(n²). A = S+UV is always fully dense, so a direct solve cannot exploit the sparsity. - :iterative — LSQR (default) / LSMR driven by A's structured matvec and adjoint, never forming the dense A; for large n. Converges to the same min-norm solution from x0=0. Lives in an IterativeSolvers extension (weakdep). Also adds a structured adjoint/transpose matvec to core (Aᴴu = Sᴴu + Vᴴ(Uᴴu)) — a real fix, since A'*u previously fell back to a ~1000× slower generic getindex path. This lets the iterative engine drive A directly (no LinearMaps/LinearOperators dependency). Engine choice: IterativeSolvers, not Krylov/LinearSolve's KrylovJL_LSMR. Measured: with a correct adjoint, IterativeSolvers lsqr/lsmr reach ~1e-12 min-norm accuracy while Krylov lsmr/lsqr halt early at ~1e-6 (and tightening tolerances does not help). Both converge — Krylov is not "broken" — but IterativeSolvers is ~6 orders tighter, which matters here. Adversarial review found and fixed: (1) lsqr's `isconverged` is unreliable (reports success at the iteration cap with a wrong answer) — now flag on budget exhaustion and report the true LS residual; (2) negative λ silently treated as |λ| by the dense path — now rejected; (3) Int/Rational slipped the eltype guard (checked the float-promoted type) — now checks the raw promoted type. Regression tests for each. Stacked on #4 (the QR PR): test_lstsq.jl asserts qr(A) throws SingularException while lstsq succeeds, so it depends on the QR entry point. 390/390 tests pass on Julia 1.10, 1.11, and 1.12. Runic-formatted, typos clean. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

…:auto the default Adds `alg=:structured` and makes `alg=:auto` the default for `lstsq`. When S is nonsingular, A = S(I + ZV) with Z = S⁻¹U, and the entire rank deficiency collapses into the small r×r capacitance C = I + V Z (nullity(A) = nullity(C), since det A = det S · det C). So A⁺b is assembled from ONE sparse factorization of S (PureKLU) + r solves for Z + small dense SVD/QR on r×r and n×s (s ≤ 2r) blocks — A is NEVER formed or factored densely. O(factor(S) + n·r²); measured 191× faster than the dense COD at n=2000, r=6. Handles consistent AND inconsistent b (a rank-k projection of b onto range(A) via the left-null basis S⁻ᴴVᴴnull(Cᴴ)). :auto uses :structured when S is nonsingular & well-conditioned, else the exact :dense COD. The structured route inverts S, so it is guarded two ways and falls back on failure: (1) klu throws on exactly-singular S; (2) near-singular S is caught by a κ̂(S)=‖S‖₁‖Z‖/‖U‖ estimate (klu does NOT throw there — it silently returns garbage Z, verified) plus a post-hoc least-squares-optimality check. `alg=:structured` (forced) errors rather than return a wrong answer; `alg=:auto` falls back to :dense. Every formula was derived and verified against the pinv(Matrix(A))*b oracle before implementation (consistent/inconsistent × rank deficiency k=1,2,3 × dense-U/SelectorMatrix-U × complex): relerr ~1e-15, min-norm + LS-optimality + null-space-free certificates. The inconsistent-case projection is essential — the naive M⁺S⁻¹b shortcut is 23% wrong on inconsistent b (the LS objective ‖Ax−b‖=‖S(Mx−S⁻¹b)‖ is S-weighted). 581/581 tests pass on Julia 1.10, 1.11, and 1.12. Runic-formatted, typos clean. No new deps (reuses the existing PureKLU dependency for the S factorization). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

ChrisRackauckas-Claude · 2026-05-31T11:54:28Z

Major update (699afb6): added a structure-exploiting DIRECT engine and made it the default.

lstsq(A, b) now defaults to alg=:auto, which uses a new alg=:structured solve when S is nonsingular: A = S(I+ZV), Z=S⁻¹U, and the entire rank deficiency collapses into the small r×r capacitance C=I+VZ (nullity(A)=nullity(C)). So A⁺b comes from one sparse factorization of S (PureKLU) + small dense SVD/QR — A is never formed or factored densely. O(factor(S)+n·r²), 191× faster than the dense COD at n=2000. Handles consistent and inconsistent b.

Guarded two ways (klu throws on exactly-singular S; a κ̂(S) estimate + post-hoc LS-optimality catch the near-singular case where klu silently returns garbage); :auto falls back to the exact :dense COD, :structured errors rather than return a wrong answer.

Every formula was derived and verified against the pinv oracle before implementation (consistent/inconsistent × rank-deficiency 1–3 × dense/SelectorMatrix-U × complex, all ~1e-15). 581/581 tests pass on Julia 1.10/1.11/1.12.

Note on the singular-S case (a core almost-banded/BVP regime): currently it falls back to the dense COD. I am separately verifying whether a null-space-peeling extension can make that case structure-exploiting-direct too; will follow up.

Will refresh the PR description once that lands.

…dles the BVP case When S is singular because its boundary rows are zeroed and supplied by a SelectorMatrix U (the `replace=true` / almost-banded case), the structured engine previously fell back to the dense COD. Add a sparse "peel": A = S + UV = (S + U Uᴴ) + U(V − Uᴴ), and for a selector U the term U Uᴴ is just r diagonal entries, so S̃ = S + U Uᴴ stays SPARSE and is typically nonsingular (the zeroed rows get a unit diagonal back), with the rank unchanged. The capacitance method then applies to (S̃, U, V − Uᴴ) — a structure-exploiting DIRECT solve, no densifying. Verified against pinv(Matrix(A))*b: ~2-3e-15 for A nonsingular AND rank-deficient, consistent AND inconsistent b. A general singular S whose null space is NOT coordinate-aligned has a dense de-singularizing correction (no sparse peel exists), so it still falls back to the exact dense COD — a genuine mathematical limit, caught by `_peel_selector` returning nothing. 617/617 tests pass on Julia 1.10, 1.11, and 1.12. Runic-formatted, typos clean. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

ChrisRackauckas-Claude · 2026-05-31T12:53:26Z

Follow-up (4abc4fc): the structured direct engine now handles the singular-S BVP case too.

When S is singular because its boundary rows are zeroed and supplied by a SelectorMatrix U (the replace=true / almost-banded regime), a sparse peel re-splits A = (S + U Uᴴ) + U(V − Uᴴ). For a selector U, U Uᴴ is just r diagonal entries, so S̃ = S + U Uᴴ stays sparse and nonsingular (rank unchanged) — the capacitance method applies directly, no densifying. Verified ~2-3e-15 vs pinv for A nonsingular and rank-deficient, consistent and inconsistent.

A general singular S (null space not coordinate-aligned) has a dense de-singularizing correction — no sparse peel exists — so it correctly falls back to the exact dense COD. That is a genuine mathematical limit, not a gap.

So the structure-exploiting direct method now covers: S nonsingular, and S singular with sparse/coordinate-aligned null structure (the case this package is built for). 617/617 tests pass on Julia 1.10/1.11/1.12.

ChrisRackauckas-Claude · 2026-05-31T15:01:04Z

Superseded by #6, which combines the QR factorization and the lstsq least-squares work into one PR off main (and rewrites the QR backend onto SparseColumnPivotedQR: allocation-free solve, symbolic-reuse refactor, rank-revealing, generic eltypes). Closing in favor of #6.

ChrisRackauckas added 2 commits May 30, 2026 19:40

ChrisRackauckas-Claude mentioned this pull request May 31, 2026

QR via SparseColumnPivotedQR + minimum-norm least squares (lstsq) #6

Merged

ChrisRackauckas-Claude closed this May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lstsq: minimum-norm least squares (A⁺b) for rank-deficient / inconsistent systems#5

Add lstsq: minimum-norm least squares (A⁺b) for rank-deficient / inconsistent systems#5
ChrisRackauckas-Claude wants to merge 3 commits into
add-qr-factorizationfrom
add-lstsq

ChrisRackauckas-Claude commented May 30, 2026

Uh oh!

ChrisRackauckas-Claude commented May 31, 2026

Uh oh!

ChrisRackauckas-Claude commented May 31, 2026

Uh oh!

ChrisRackauckas-Claude commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ChrisRackauckas-Claude commented May 30, 2026