Skip to content

Add lstsq: minimum-norm least squares (A⁺b) for rank-deficient / inconsistent systems#5

Closed
ChrisRackauckas-Claude wants to merge 3 commits into
add-qr-factorizationfrom
add-lstsq
Closed

Add lstsq: minimum-norm least squares (A⁺b) for rank-deficient / inconsistent systems#5
ChrisRackauckas-Claude wants to merge 3 commits into
add-qr-factorizationfrom
add-lstsq

Conversation

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Collaborator

Add lstsq: minimum-norm least squares (A⁺b) for rank-deficient / inconsistent systems

Stacked on #4 (the QR PR) — please review #4 first; this branch is based on it. Ignore until reviewed by @ChrisRackauckas. Draft.

\, factorize/lu, and qr solve nonsingular systems and throw SingularException on a singular A. This adds lstsq(A, b) — the minimum-norm least-squares solution x = A⁺b (the Moore–Penrose solution: of all x minimizing ‖Ax−b‖₂, the one of smallest ‖x‖₂) — correct for rank-deficient and inconsistent systems. It is a separate, opt-in entry point; the direct solves are unchanged and keep throwing.

x = lstsq(A, b)                     # exact, dense COD (default)
x = lstsq(A, b; alg = :iterative)   # structure-exploiting LSQR/LSMR (needs `using IterativeSolvers`)
x = lstsq(A, b; λ = 1e-3)           # Tikhonov-damped (ridge) variant

Why not the bordered/Woodbury machinery

The prior QR work established (and #4's discussion shows) that solving the bordered system [S U; V -I][x;y]=[b;0] in least squares minimizes a different objective than ‖Ax−b‖ — measured residual 3–65× too large on rank-deficient inconsistent systems — and there's no Woodbury formula for the pseudoinverse of a sum. So lstsq works on A directly.

Two engines

engine exactness cost when
:dense (default) LAPACK gelsy (complete-orthogonal decomposition) of the densified A exact — matches pinv(Matrix(A))*b to ~1e-15 O(n³) / O(n²) mem up to n ~ few thousand
:iterative LSQR/LSMR via A's structured matvec + adjoint, never forming dense A min-norm to atol/btol (~1e-12); fragile if ill-conditioned per-iter O(nnz(S)+nr) large n

A = S + U·V is always fully dense (the rank-r outer product fills every entry), so a direct solve genuinely can't exploit the sparsity — the iterative path is the structure-exploiting one for large n.

Structured adjoint matvec (a real fix, beyond LS)

Added mul!(y, A', u) / Aᴴ / Aᵀ to core computing Aᴴu = Sᴴu + Vᴴ(Uᴴu). Previously A'*u fell back to LinearAlgebra's generic getindex path — ~1000× slower (it materializes columns of A). This is what lets the iterative engine drive A directly, with no LinearMaps/LinearOperators dependency.

Engine choice: IterativeSolvers, not Krylov / KrylovJL_LSMR

I measured this directly rather than route through LinearSolve's KrylovJL_* (which would need no new dep). With a correct adjoint, both libraries converge — but IterativeSolvers lsqr/lsmr reach ~1e-12 min-norm accuracy while Krylov lsmr/lsqr halt early at ~1e-6 (tightening atol/rtol doesn't help — Krylov stops on a different internal criterion). So Krylov is not broken (an internal review draft had claimed a catastrophic 1.92 error — that was an adjoint-wrapper artifact, corrected), but IterativeSolvers is ~6 orders tighter, which is the right call for a least-squares primitive. IterativeSolvers is a light weakdep behind an extension; the LinearSolve extension is untouched.

Adversarial review found and fixed

  1. lsqr silent wrong answers — IterativeSolvers' isconverged is unreliable for lsqr (reports success even at the iteration cap with a garbage result; reproduced err=6.4 at maxiter=3). Now flag on budget exhaustion (iters ≥ maxiters, which cleanly separates converged/not for both solvers) and report the true LS residual ‖Aᴴ(Ax−b)‖.
  2. Negative λ silently treated as |λ| by the dense path (rejected by iterative) → both now validate λ ≥ 0.
  3. Eltype guard holeInt/Rational slipped through because the check ran on the float-promoted type → now checks the raw promote_type.

Regression tests added for each. The review independently confirmed correctness across higher rank deficiency (n−3/n−4), r==0, SelectorMatrix-U, complex (adjoint ≠ transpose), mixed precision, Tikhonov, and the 5-arg adjoint mul! — all matching the pinv oracle.

Tests / CI

  • New test/test_lstsq.jl (~90 assertions): adjoint/transpose matvec; dense & iterative min-norm vs pinv (consistent + inconsistent, LS-optimality + null-space-free certificates); SelectorMatrix-U; full-rank reduces to \; complex; Tikhonov; the "lstsq succeeds where \/factorize/qr throw" separation; eltype/arg/λ/convergence-warning guards.
  • 390/390 pass locally on Julia 1.10, 1.11, and 1.12.
  • Runic-formatted, typos clean. New dependency: IterativeSolvers (weakdep, extension-gated — core load unchanged).

🤖 Generated with Claude Code

Co-Authored-By: Chris Rackauckas accounts@chrisrackauckas.com

…nsistent systems

`lstsq(A, b)` returns the Moore–Penrose minimum-norm least-squares solution, correct
for rank-deficient or inconsistent A where `\`/`factorize`/`qr` throw SingularException.
A separate, opt-in entry point — the direct solves stay exact and keep throwing.

Why not the bordered/Woodbury path: solving [S U; V -I][x;y]=[b;0] in least squares
minimizes a different objective than ‖Ax−b‖ (measured residual 3–65× too large on
rank-deficient inconsistent systems), and there is no Woodbury formula for the
pseudoinverse of a sum. So this works on A directly.

Two engines:
- :dense (default) — LAPACK gelsy (complete-orthogonal decomposition) of the densified
  A. Exact (matches pinv(Matrix(A))*b to machine precision), O(n³)/O(n²). A = S+UV is
  always fully dense, so a direct solve cannot exploit the sparsity.
- :iterative — LSQR (default) / LSMR driven by A's structured matvec and adjoint, never
  forming the dense A; for large n. Converges to the same min-norm solution from x0=0.
  Lives in an IterativeSolvers extension (weakdep).

Also adds a structured adjoint/transpose matvec to core (Aᴴu = Sᴴu + Vᴴ(Uᴴu)) — a real
fix, since A'*u previously fell back to a ~1000× slower generic getindex path. This lets
the iterative engine drive A directly (no LinearMaps/LinearOperators dependency).

Engine choice: IterativeSolvers, not Krylov/LinearSolve's KrylovJL_LSMR. Measured: with a
correct adjoint, IterativeSolvers lsqr/lsmr reach ~1e-12 min-norm accuracy while Krylov
lsmr/lsqr halt early at ~1e-6 (and tightening tolerances does not help). Both converge —
Krylov is not "broken" — but IterativeSolvers is ~6 orders tighter, which matters here.

Adversarial review found and fixed: (1) lsqr's `isconverged` is unreliable (reports success
at the iteration cap with a wrong answer) — now flag on budget exhaustion and report the
true LS residual; (2) negative λ silently treated as |λ| by the dense path — now rejected;
(3) Int/Rational slipped the eltype guard (checked the float-promoted type) — now checks
the raw promoted type. Regression tests for each.

Stacked on #4 (the QR PR): test_lstsq.jl asserts qr(A) throws SingularException while lstsq
succeeds, so it depends on the QR entry point.

390/390 tests pass on Julia 1.10, 1.11, and 1.12. Runic-formatted, typos clean.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…:auto the default

Adds `alg=:structured` and makes `alg=:auto` the default for `lstsq`. When S is nonsingular,
A = S(I + ZV) with Z = S⁻¹U, and the entire rank deficiency collapses into the small r×r
capacitance C = I + V Z (nullity(A) = nullity(C), since det A = det S · det C). So A⁺b is
assembled from ONE sparse factorization of S (PureKLU) + r solves for Z + small dense SVD/QR
on r×r and n×s (s ≤ 2r) blocks — A is NEVER formed or factored densely. O(factor(S) + n·r²);
measured 191× faster than the dense COD at n=2000, r=6. Handles consistent AND inconsistent b
(a rank-k projection of b onto range(A) via the left-null basis S⁻ᴴVᴴnull(Cᴴ)).

:auto uses :structured when S is nonsingular & well-conditioned, else the exact :dense COD.
The structured route inverts S, so it is guarded two ways and falls back on failure: (1) klu
throws on exactly-singular S; (2) near-singular S is caught by a κ̂(S)=‖S‖₁‖Z‖/‖U‖ estimate
(klu does NOT throw there — it silently returns garbage Z, verified) plus a post-hoc
least-squares-optimality check. `alg=:structured` (forced) errors rather than return a wrong
answer; `alg=:auto` falls back to :dense.

Every formula was derived and verified against the pinv(Matrix(A))*b oracle before
implementation (consistent/inconsistent × rank deficiency k=1,2,3 × dense-U/SelectorMatrix-U ×
complex): relerr ~1e-15, min-norm + LS-optimality + null-space-free certificates. The
inconsistent-case projection is essential — the naive M⁺S⁻¹b shortcut is 23% wrong on
inconsistent b (the LS objective ‖Ax−b‖=‖S(Mx−S⁻¹b)‖ is S-weighted).

581/581 tests pass on Julia 1.10, 1.11, and 1.12. Runic-formatted, typos clean. No new deps
(reuses the existing PureKLU dependency for the S factorization).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas-Claude
Copy link
Copy Markdown
Collaborator Author

Major update (699afb6): added a structure-exploiting DIRECT engine and made it the default.

lstsq(A, b) now defaults to alg=:auto, which uses a new alg=:structured solve when S is nonsingular: A = S(I+ZV), Z=S⁻¹U, and the entire rank deficiency collapses into the small r×r capacitance C=I+VZ (nullity(A)=nullity(C)). So A⁺b comes from one sparse factorization of S (PureKLU) + small dense SVD/QRA is never formed or factored densely. O(factor(S)+n·r²), 191× faster than the dense COD at n=2000. Handles consistent and inconsistent b.

Guarded two ways (klu throws on exactly-singular S; a κ̂(S) estimate + post-hoc LS-optimality catch the near-singular case where klu silently returns garbage); :auto falls back to the exact :dense COD, :structured errors rather than return a wrong answer.

Every formula was derived and verified against the pinv oracle before implementation (consistent/inconsistent × rank-deficiency 1–3 × dense/SelectorMatrix-U × complex, all ~1e-15). 581/581 tests pass on Julia 1.10/1.11/1.12.

Note on the singular-S case (a core almost-banded/BVP regime): currently it falls back to the dense COD. I am separately verifying whether a null-space-peeling extension can make that case structure-exploiting-direct too; will follow up.

Will refresh the PR description once that lands.

…dles the BVP case

When S is singular because its boundary rows are zeroed and supplied by a SelectorMatrix U
(the `replace=true` / almost-banded case), the structured engine previously fell back to the
dense COD. Add a sparse "peel": A = S + UV = (S + U Uᴴ) + U(V − Uᴴ), and for a selector U the
term U Uᴴ is just r diagonal entries, so S̃ = S + U Uᴴ stays SPARSE and is typically nonsingular
(the zeroed rows get a unit diagonal back), with the rank unchanged. The capacitance method
then applies to (S̃, U, V − Uᴴ) — a structure-exploiting DIRECT solve, no densifying.

Verified against pinv(Matrix(A))*b: ~2-3e-15 for A nonsingular AND rank-deficient, consistent
AND inconsistent b. A general singular S whose null space is NOT coordinate-aligned has a dense
de-singularizing correction (no sparse peel exists), so it still falls back to the exact dense
COD — a genuine mathematical limit, caught by `_peel_selector` returning nothing.

617/617 tests pass on Julia 1.10, 1.11, and 1.12. Runic-formatted, typos clean.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas-Claude
Copy link
Copy Markdown
Collaborator Author

Follow-up (4abc4fc): the structured direct engine now handles the singular-S BVP case too.

When S is singular because its boundary rows are zeroed and supplied by a SelectorMatrix U (the replace=true / almost-banded regime), a sparse peel re-splits A = (S + U Uᴴ) + U(V − Uᴴ). For a selector U, U Uᴴ is just r diagonal entries, so S̃ = S + U Uᴴ stays sparse and nonsingular (rank unchanged) — the capacitance method applies directly, no densifying. Verified ~2-3e-15 vs pinv for A nonsingular and rank-deficient, consistent and inconsistent.

A general singular S (null space not coordinate-aligned) has a dense de-singularizing correction — no sparse peel exists — so it correctly falls back to the exact dense COD. That is a genuine mathematical limit, not a gap.

So the structure-exploiting direct method now covers: S nonsingular, and S singular with sparse/coordinate-aligned null structure (the case this package is built for). 617/617 tests pass on Julia 1.10/1.11/1.12.

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Collaborator Author

Superseded by #6, which combines the QR factorization and the lstsq least-squares work into one PR off main (and rewrites the QR backend onto SparseColumnPivotedQR: allocation-free solve, symbolic-reuse refactor, rank-revealing, generic eltypes). Closing in favor of #6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants