Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,17 @@ PureKLU = "0c0d3e7f-3a8b-4f7e-b6f1-9a4d2e7c1f01"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"

[weakdeps]
IterativeSolvers = "42fd0dbc-a981-5370-80f2-aaf504508153"
LinearSolve = "7ed4a6bd-45f5-4d41-b270-4a48e9bafcae"

[extensions]
SparseWithDenseRowColMatricesIterativeSolversExt = "IterativeSolvers"
SparseWithDenseRowColMatricesLinearSolveExt = "LinearSolve"

[compat]
Aqua = "0.8"
ForwardDiff = "0.10, 1"
IterativeSolvers = "0.9"
LinearAlgebra = "1"
LinearSolve = "3"
PrecompileTools = "1"
Expand All @@ -31,6 +34,7 @@ julia = "1.10"
[extras]
Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
IterativeSolvers = "42fd0dbc-a981-5370-80f2-aaf504508153"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
LinearSolve = "7ed4a6bd-45f5-4d41-b270-4a48e9bafcae"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Expand All @@ -39,4 +43,4 @@ SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Aqua", "ForwardDiff", "LinearAlgebra", "LinearSolve", "Random", "SafeTestsets", "SparseArrays", "Test"]
test = ["Aqua", "ForwardDiff", "IterativeSolvers", "LinearAlgebra", "LinearSolve", "Random", "SafeTestsets", "SparseArrays", "Test"]
61 changes: 60 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,59 @@ refactor since it has no symbolic reuse) and far slower than the Woodbury fast p
error is comparable (~`1e-13`) for all three on a well-conditioned `A`. Use `qr` for the
factorization/rank-revealing semantics, not for speed.

## Rank-deficient / inconsistent least squares (`lstsq`)

`\`, `factorize`/`lu`, and `qr` solve **nonsingular** systems and throw `SingularException`
when `A` is singular. For a genuinely rank-deficient or inconsistent system, `lstsq(A, b)`
returns the **minimum-norm least-squares solution** `x = A⁺b` (the Moore–Penrose solution: of
all `x` minimizing `‖Ax − b‖₂`, the one with smallest `‖x‖₂`).

```julia
x = lstsq(A, b) # :auto — structured direct, dense fallback (default)
x = lstsq(A, b; alg = :structured) # structure-exploiting direct solve (never densifies A)
x = lstsq(A, b; alg = :dense) # exact dense complete-orthogonal decomposition (gelsy)
x = lstsq(A, b; alg = :iterative) # LSQR/LSMR (needs `using IterativeSolvers`)
x = lstsq(A, b; λ = 1e-3) # Tikhonov-damped (ridge) variant
```

This is a **separate, opt-in** entry point — `\`/`factorize`/`qr` stay exact and keep throwing
on singular `A`; least squares never happens implicitly. The bordered/Woodbury machinery
*cannot* produce `A⁺b`: solving `[S U; V -I][x;y]=[b;0]` in least squares minimizes a different
objective than `‖Ax−b‖` (measured 3–65× too large a residual on rank-deficient inconsistent
systems), and there is no Woodbury-style formula for the pseudoinverse of a sum.

Engines (`:auto` picks the first that applies):

* **`:structured`** — a **direct** solve that **never densifies `A`**. When `S` is nonsingular,
`A = S(I + ZV)` with `Z = S⁻¹U`, and the *entire* rank deficiency collapses into the small
`r × r` capacitance `C = I + V Z` (`nullity(A) = nullity(C)`, since `det A = det S · det C`).
So `A⁺b` is assembled from **one sparse factorization of `S`** (PureKLU) + `r` solves (`Z`) +
small dense SVD/QR on `r × r` and `n × s` (`s ≤ 2r`) blocks — `O(factor(S) + n·r²)`, **>100×
faster than `:dense`** for large `n` (measured 191× at `n = 2000`, `r = 6`). Handles
consistent *and* inconsistent `b` (a rank-`k` projection of `b` onto `range(A)`). A **singular
`S` with coordinate-aligned null structure** — the boundary-condition / `replace=true` case,
where the zeroed rows are supplied by a `SelectorMatrix` `U` — is handled by a sparse *peel*
`S̃ = S + U Uᴴ` (just `r` diagonal entries, so `S̃` stays sparse and is nonsingular) with
`A = S̃ + U(V − Uᴴ)`, keeping the direct structure-exploiting path. It errors only when `S` is
near-singular or singular with a *general* (dense) null space — where inverting `S` would be
silently wrong — so `:auto` falls back to `:dense` there.
* **`:dense`** — complete-orthogonal decomposition (LAPACK `gelsy`) of the densified `A`. Exact
(matches `pinv(Matrix(A))*b` to machine precision), `O(n³)` / `O(n²)`. The mandatory fallback
when `S` is near-singular, or singular with a null space that is *not* coordinate-aligned (so
the sparse peel does not apply and there is no structure-exploiting direct route — the
null-space correction would itself be dense).
* **`:iterative`** — LSQR (default) / LSMR driven by `A`'s structured matvec/adjoint, never
forming `A` — for `n` too large to densify *and* `S` singular (so neither `:structured` nor
`:dense` apply). Started from `x0 = 0` it converges to the same minimum-norm solution;
approximate (to `atol`/`btol`), fragile under ill-conditioning, and warns on non-convergence.
Lives in an extension (`using IterativeSolvers`).

`:auto` (the default) uses `:structured` when `S` is nonsingular (or singular with a
coordinate-aligned null space) and well-conditioned, otherwise the exact `:dense` COD — so you
get the fast structure-exploiting direct solve when it is safe and the exact answer always. Only `Float32`/`Float64`/`ComplexF32`/`ComplexF64` are
supported. Adjoint/transpose matvec (`A'*u`, `Aᴴ`, `Aᵀ`) is computed by the same structured
kernel and is useful on its own.

## Benchmarks

A banded sparse interior plus `r = 8` dense rows (`n = 5000`). Factoring only `S` and
Expand Down Expand Up @@ -194,9 +247,12 @@ globally-scattered `S` degrades every sparse method.)
* [x] Matrix–vector / matrix–matrix multiply (`mul!` vector path is allocation-free)
* [x] Factorization and linear solve (`factorize`/`lu`, `\`, `ldiv!`)
* [x] In-place refactorization with a fixed sparsity pattern (`refactor!`)
* [x] Adjoint / transpose solves
* [x] Adjoint / transpose solves *and* structured adjoint/transpose matvec (`A'*u`, `Aᴴ`, `Aᵀ`)
* [x] Generic element types (`BigFloat`, `ForwardDiff.Dual`, complex) — LU/Woodbury path
* [x] Rank-revealing QR (`qr`) for ill-conditioned `S` (`Float64`/`ComplexF64` only)
* [x] Minimum-norm least squares (`lstsq`) for rank-deficient / inconsistent `A` — a
structure-exploiting **direct** solve (capacitance method) when `S` is nonsingular, >100×
faster than densifying; exact dense COD fallback otherwise

`qr(A)` builds the augmented sparse-QR factorization (above). It is the stability path, not
the throughput path: no symbolic-reuse refactor, no zero-allocation solve, and `Float64`
Expand Down Expand Up @@ -317,12 +373,15 @@ augmented mode exists, and augmented QR needs no refinement); singular `A` likew
SparseWithDenseRowColMatrix SelectorMatrix
sparsepart fillpart lowrankfactors exclusive_sparsepart denserank
factorize lu qr \\ ldiv! refactor! lu! qr! update_lowrank!
lstsq # minimum-norm least squares (rank-deficient / inconsistent)
SparseWithDenseRowColWoodbury SparseWithDenseRowColAugmented SparseWithDenseRowColQRAugmented
recommend_lowrank_peel PeelRecommendation
SparseWithDenseRowColFactorization # LinearSolve.jl algorithm (LU/Woodbury, default)
SparseWithDenseRowColQRFactorization # LinearSolve.jl algorithm (stable QR, opt-in)
```

The `:iterative` engine of `lstsq` requires `using IterativeSolvers` (a package extension).

## Installation

PureKLU.jl is not yet in the General registry; install it from the SciML repository:
Expand Down
53 changes: 53 additions & 0 deletions ext/SparseWithDenseRowColMatricesIterativeSolversExt.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
module SparseWithDenseRowColMatricesIterativeSolversExt

using SparseWithDenseRowColMatrices
using SparseWithDenseRowColMatrices: SparseWithDenseRowColMatrix, _check_lstsq_eltype
using IterativeSolvers
using LinearAlgebra: LinearAlgebra

# Iterative engine for `lstsq(A, b; alg=:iterative)`. LSQR (default) / LSMR are driven entirely
# by A's structured matvec and adjoint (`mul!(y, A, x)` / `mul!(y, A', x)` defined in
# src/matvec.jl), so the dense `A = S + U·V` is never formed. Started from x0 = 0, both
# converge to the MINIMUM-NORM least-squares solution for rank-deficient / inconsistent systems
# (the min-norm guarantee is contingent on the zero start — the solver is never given a guess).
function SparseWithDenseRowColMatrices._lstsq_iterative(
A::SparseWithDenseRowColMatrix, b::AbstractVector;
solver::Symbol = :lsqr, atol::Real = 1.0e-12, btol::Real = 1.0e-12,
maxiters::Integer = 20 * size(A, 1), λ::Real = 0, verbose::Bool = false,
# accept and ignore the dense-only keyword so `alg` can be flipped without edits
rcond = nothing,
)
n = size(A, 1)
length(b) == n || throw(DimensionMismatch("A has $n columns, b has length $(length(b))"))
_check_lstsq_eltype(promote_type(eltype(A), eltype(b))) # reject non-BLAS-float up front
λ ≥ 0 || throw(ArgumentError("λ (Tikhonov damping) must be ≥ 0; got $λ"))

x, ch = if solver === :lsqr
IterativeSolvers.lsqr(A, b; damp = λ, atol = atol, btol = btol, maxiter = maxiters, log = true)
elseif solver === :lsmr
IterativeSolvers.lsmr(A, b; λ = λ, atol = atol, btol = btol, maxiter = maxiters, log = true)
else
throw(ArgumentError("solver must be :lsqr or :lsmr; got :$solver"))
end

# IterativeSolvers' `isconverged` is unreliable here (lsqr reports success even when it
# stops at the iteration cap with a wrong answer), so flag on budget exhaustion — which
# cleanly separates converged from non-converged for both lsqr and lsmr — and report the
# true least-squares optimality ‖Aᴴ(Ax−b)‖ (one cheap structured adjoint matvec; →0 at a
# minimizer) rather than trusting the solver's own flag.
capped = ch.iters ≥ maxiters
verbose && @info "lstsq(alg=:iterative)" solver converged = !capped iters = ch.iters
if capped
aresid = LinearAlgebra.norm(A' * (A * x .- b))
@warn(
"lstsq(...; alg=:iterative, solver=:$solver) hit the $maxiters-iteration budget " *
"without converging — the result is NOT a reliable least-squares solution. " *
"Raise `maxiters`, or use the exact `alg=:dense` if `A` is small enough to densify.",
least_squares_residual = aresid,
iters = ch.iters,
)
end
return x
end

end # module
3 changes: 2 additions & 1 deletion src/SparseWithDenseRowColMatrices.jl
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,13 @@ include("woodbury.jl")
include("augmented.jl")
include("qr.jl")
include("factorize.jl")
include("lstsq.jl")
include("detect.jl")

export SparseWithDenseRowColMatrix, SelectorMatrix,
sparsepart, fillpart, lowrankfactors, exclusive_sparsepart, denserank,
SparseWithDenseRowColWoodbury, SparseWithDenseRowColAugmented,
SparseWithDenseRowColQRAugmented, refactor!, update_lowrank!,
SparseWithDenseRowColQRAugmented, refactor!, update_lowrank!, lstsq,
recommend_lowrank_peel, PeelRecommendation,
SparseWithDenseRowColFactorization, SparseWithDenseRowColQRFactorization

Expand Down
Loading
Loading