diff --git a/README.md b/README.md index 0877c92..9f249be 100644 --- a/README.md +++ b/README.md @@ -106,6 +106,62 @@ The `refactor!` + `\` loop is allocation-free apart from the tiny `O(r)` dense p * **`:auto`** (default) — Woodbury, automatically falling back to the augmented system if `S` is singular or `C` is ill-conditioned. +### QR (the numerically stable path) + +`qr(A; strategy=:augmented)` factors the bordered system `[S U; V -I]` with **SuiteSparseQR** +(the stdlib sparse `qr`), returning a `SparseWithDenseRowColQRAugmented`. Like the augmented +LU path it never forms `S⁻¹`, so it stays accurate when the sparse part `S` is ill-conditioned +or nearly singular but the dense rows regularize `A` — the +[FastAlmostBandedMatrices](https://github.com/SciML/FastAlmostBandedMatrices.jl) regime, where +QR is the conventional choice. Measured forward error on `A = S + U V` with `κ(A) ≈ 6` as +`κ(S)` sweeps from `3.6e6` to `3.6e14` (reference: BigFloat dense solve): + +``` + κ(S) qr (augmented) factorize :woodbury, refine=2 factorize :augmented (LU) + 3.6e6 … 3.1e-16 (flat) 8.7e-17 → 4.5e-7 1.2e-16 (flat) + 3.6e14 +``` + +The Woodbury path's error grows with `κ(S)·κ(C)`; here (where the *assembled* `A` stays +well-conditioned, `κ(A) ≈ 6`) both augmented paths hold at the noise floor. + +**Be clear about what QR does and does not buy you.** Augmented QR is *not* more accurate than +the augmented **LU** path. It is backward-stable, so its forward error tracks `κ(A)·eps`; when +the *assembled* `A` is itself ill-conditioned the augmented-LU path's partial pivoting can be +**more** accurate on these structured matrices (measured: LU near the noise floor where QR sits +at `κ(A)·eps`, orders apart for `κ(A) ≳ 1e6`). QR's genuine, distinct properties versus +augmented-LU are that it is **rank-revealing** (a clean numerical-rank/singularity signal) and +that it is the QR-based interface users of almost-banded solvers expect — not higher accuracy, +which the LU path already delivers, and not throughput (it is slower; see the trade-offs +below). Reach for `qr` when you specifically want a QR factorization or its rank diagnostics; +otherwise `factorize`/`lu` is the better default. + +There is **no** `strategy=:woodbury` for `qr`: a Woodbury-over-`qr(S)` solve shares the same +`κ(S)·κ(C)` cancellation and is catastrophically inaccurate on ill-conditioned `S` (~`1e-1`), +so it is deliberately not offered. `qr` throws `SingularException` when the bordered system is +numerically rank-deficient (`κ ≳ 1/(m·eps)`); a merely ill-conditioned, accurately-solvable `A` +is **not** rejected (SuiteSparseQR's aggressive default column deflation is disabled). + +Trade-offs relative to `factorize`/`lu`: QR supports **only** `Float64`/`ComplexF64` +(SuiteSparseQR's single-precision sparse QR is version-dependent — it throws on Julia 1.11 — so +`Float32`/`ComplexF32` route to the LU path), has **no** symbolic-reuse `refactor!` +(SuiteSparseQR exposes no `qr!`, so it re-factors fully), and **allocates** per solve (no +in-place `ldiv!`). It is the stability path, not the throughput path — for `Float32`/generic +eltypes or hot Newton refactor loops use `factorize`/`lu`. The cost is real and consistent +(banded interior + `r` dense rows, factor / solve / refactor, ms): + +``` + n=2000 r=4 n=5000 r=8 + qr (augmented QR) : 67 / 7.5 / 90 813 / 156 / 776 ← stability path, slowest + lu (augmented LU) : 47 / 1.7 / 21 633 / 11 / 188 ← klu! symbolic reuse on refactor + lu (Woodbury) : 1.3/ 0.1 / n/a 3.6 / 0.3 / n/a ← fast path (forms S⁻¹), ~50–60× +``` + +QR is slower than the augmented-LU path on every axis (≈1.4× factor, ≈4–14× solve, ≈4× +refactor since it has no symbolic reuse) and far slower than the Woodbury fast path; forward +error is comparable (~`1e-13`) for all three on a well-conditioned `A`. Use `qr` for the +factorization/rank-revealing semantics, not for speed. + ## Benchmarks A banded sparse interior plus `r = 8` dense rows (`n = 5000`). Factoring only `S` and @@ -139,10 +195,12 @@ globally-scattered `S` degrades every sparse method.) * [x] Factorization and linear solve (`factorize`/`lu`, `\`, `ldiv!`) * [x] In-place refactorization with a fixed sparsity pattern (`refactor!`) * [x] Adjoint / transpose solves -* [x] Generic element types (`BigFloat`, `ForwardDiff.Dual`, complex) +* [x] Generic element types (`BigFloat`, `ForwardDiff.Dual`, complex) — LU/Woodbury path +* [x] Rank-revealing QR (`qr`) for ill-conditioned `S` (`Float64`/`ComplexF64` only) -`qr` is intentionally not provided — a general sparse `QR` causes far more fill than the -LU-based KLU path; use `factorize`/`lu`. +`qr(A)` builds the augmented sparse-QR factorization (above). It is the stability path, not +the throughput path: no symbolic-reuse refactor, no zero-allocation solve, and `Float64` +/`ComplexF64` only — use `factorize`/`lu` for `Float32`/generic eltypes or hot refactor loops. ## When is this the right tool? @@ -248,15 +306,21 @@ not throw), again matching KLU. One caveat: once a singular `S` forces the augme the cache keeps reusing it even if later `S` values are nonsingular — `init` a fresh cache to get back the faster Woodbury path. +For the stable QR path, pass `SparseWithDenseRowColQRFactorization(; check_pattern)` instead. +It has no `reuse_symbolic`/`strategy`/`refine` (SuiteSparseQR has no symbolic reuse, only the +augmented mode exists, and augmented QR needs no refinement); singular `A` likewise returns +`ReturnCode.Infeasible`. It is opt-in — the default algorithm stays the LU one. + ## Public API ``` SparseWithDenseRowColMatrix SelectorMatrix sparsepart fillpart lowrankfactors exclusive_sparsepart denserank -factorize lu \\ ldiv! refactor! lu! update_lowrank! -SparseWithDenseRowColWoodbury SparseWithDenseRowColAugmented +factorize lu qr \\ ldiv! refactor! lu! qr! update_lowrank! +SparseWithDenseRowColWoodbury SparseWithDenseRowColAugmented SparseWithDenseRowColQRAugmented recommend_lowrank_peel PeelRecommendation -SparseWithDenseRowColFactorization # LinearSolve.jl algorithm (extension) +SparseWithDenseRowColFactorization # LinearSolve.jl algorithm (LU/Woodbury, default) +SparseWithDenseRowColQRFactorization # LinearSolve.jl algorithm (stable QR, opt-in) ``` ## Installation diff --git a/ext/SparseWithDenseRowColMatricesLinearSolveExt.jl b/ext/SparseWithDenseRowColMatricesLinearSolveExt.jl index 880d332..e241e57 100644 --- a/ext/SparseWithDenseRowColMatricesLinearSolveExt.jl +++ b/ext/SparseWithDenseRowColMatricesLinearSolveExt.jl @@ -2,10 +2,11 @@ module SparseWithDenseRowColMatricesLinearSolveExt using SparseWithDenseRowColMatrices using SparseWithDenseRowColMatrices: SparseWithDenseRowColMatrix, SparseWithDenseRowColWoodbury, - SparseWithDenseRowColAugmented, denserank, refactor!, SparseWithDenseRowColFactorization + SparseWithDenseRowColAugmented, SparseWithDenseRowColQRAugmented, denserank, refactor!, + SparseWithDenseRowColFactorization, SparseWithDenseRowColQRFactorization using LinearSolve using LinearSolve: LinearCache, OperatorAssumptions, LinearVerbosity, AbstractSparseFactorization -using LinearAlgebra: LinearAlgebra, ldiv!, issuccess, Factorization, SingularException, factorize +using LinearAlgebra: LinearAlgebra, ldiv!, issuccess, Factorization, SingularException, factorize, qr # LinearSolve `@reexport using SciMLBase`, so SciMLBase is reachable through it. const SciMLBase = LinearSolve.SciMLBase @@ -105,4 +106,67 @@ function SciMLBase.solve!(cache::LinearCache, alg::SWDRCFactorizationAlg; kwargs end end +# ------------------ +# QR algorithm (numerically stable, opt-in) +# ------------------ +# Mirrors SWDRCFactorizationAlg but factors via the augmented sparse QR. There is no +# `reuse_symbolic`/`strategy`/`refine`: SuiteSparseQR has no symbolic-reuse refactor (a value +# update always re-`qr`s), only the augmented mode exists, and augmented QR needs no refinement. +struct SWDRCQRFactorizationAlg <: AbstractSparseFactorization + check_pattern::Bool +end + +function SparseWithDenseRowColMatrices.SparseWithDenseRowColQRFactorization(; check_pattern::Bool = true) + return SWDRCQRFactorizationAlg(check_pattern) +end + +function LinearSolve.init_cacheval( + ::SWDRCQRFactorizationAlg, A::SparseWithDenseRowColMatrix{T}, b, u, Pl, Pr, + maxiters::Int, abstol, reltol, + verbose::Union{LinearVerbosity, Bool}, assumptions::OperatorAssumptions + ) where {T} + return SWDRCCacheval{T}(nothing) +end + +# Refresh the cached QR factorization. `refactor!` re-`qr`s in place (reusing the buffers and +# the bordered-matrix shape check); it throws on a singular `A` or a changed pattern, in which +# case we rebuild with `qr`. A genuinely singular `A` makes `qr` throw `SingularException`, +# which we map to a `nothing` cache → `Infeasible` (matching the LU algorithm's contract). +function _refresh_qr!(wrap::SWDRCCacheval, A::SparseWithDenseRowColMatrix, alg::SWDRCQRFactorizationAlg) + f = wrap.fact + if f isa SparseWithDenseRowColQRAugmented && + size(f) == size(A) && denserank(f) == denserank(A) + try + refactor!(f, A; check = alg.check_pattern) + return wrap + catch e + (e isa SingularException || e isa ArgumentError || e isa DimensionMismatch) || rethrow(e) + end + end + try + wrap.fact = qr(A) + catch e + e isa SingularException || rethrow(e) + wrap.fact = nothing + end + return wrap +end + +function SciMLBase.solve!(cache::LinearCache, alg::SWDRCQRFactorizationAlg; kwargs...) + if cache.isfresh + wrap = LinearSolve.@get_cacheval(cache, :SWDRCQRFactorizationAlg) + _refresh_qr!(wrap, cache.A, alg) + cache.isfresh = false + end + F = LinearSolve.@get_cacheval(cache, :SWDRCQRFactorizationAlg).fact + return if F !== nothing && issuccess(F) + y = ldiv!(cache.u, F, cache.b) + SciMLBase.build_linear_solution(alg, y, nothing, cache; retcode = SciMLBase.ReturnCode.Success) + else + SciMLBase.build_linear_solution( + alg, cache.u, nothing, cache; retcode = SciMLBase.ReturnCode.Infeasible + ) + end +end + end # module diff --git a/src/SparseWithDenseRowColMatrices.jl b/src/SparseWithDenseRowColMatrices.jl index 5119ff8..911c33b 100644 --- a/src/SparseWithDenseRowColMatrices.jl +++ b/src/SparseWithDenseRowColMatrices.jl @@ -1,8 +1,8 @@ module SparseWithDenseRowColMatrices using LinearAlgebra -using LinearAlgebra: Factorization, SingularException, ldiv!, lu, lu!, mul!, opnorm, - issuccess, factorize +using LinearAlgebra: Factorization, SingularException, ldiv!, lu, lu!, qr, qr!, mul!, opnorm, + issuccess, factorize, diag using SparseArrays using SparseArrays: AbstractSparseMatrixCSC, SparseMatrixCSC, nonzeros, rowvals, getcolptr, nnz, sparse @@ -15,13 +15,16 @@ include("matrix.jl") include("matvec.jl") include("woodbury.jl") include("augmented.jl") +include("qr.jl") include("factorize.jl") include("detect.jl") export SparseWithDenseRowColMatrix, SelectorMatrix, sparsepart, fillpart, lowrankfactors, exclusive_sparsepart, denserank, - SparseWithDenseRowColWoodbury, SparseWithDenseRowColAugmented, refactor!, update_lowrank!, - recommend_lowrank_peel, PeelRecommendation, SparseWithDenseRowColFactorization + SparseWithDenseRowColWoodbury, SparseWithDenseRowColAugmented, + SparseWithDenseRowColQRAugmented, refactor!, update_lowrank!, + recommend_lowrank_peel, PeelRecommendation, + SparseWithDenseRowColFactorization, SparseWithDenseRowColQRFactorization # --------------------- # Precompilation diff --git a/src/factorize.jl b/src/factorize.jl index 3eebbfa..6f37eeb 100644 --- a/src/factorize.jl +++ b/src/factorize.jl @@ -70,10 +70,50 @@ Alias for [`refactor!`](@ref) — refactor `F` in place with the new values of ` LinearAlgebra.lu!(F::SparseWithDenseRowColFactorization, A::SparseWithDenseRowColMatrix; kwargs...) = refactor!(F, A; kwargs...) -function LinearAlgebra.qr(::SparseWithDenseRowColMatrix) - throw(ArgumentError("SparseWithDenseRowColMatrix is solved by an LU-based factorization; use `factorize`/`lu`, not `qr`.")) +""" + qr(A::SparseWithDenseRowColMatrix; strategy=:auto) -> SparseWithDenseRowColQRAugmented + +QR-factorize `A = S + U V` for numerically stable repeated solves. Builds a single sparse QR +(SuiteSparseQR) of the bordered system `[S U; V -I]`, returning a +[`SparseWithDenseRowColQRAugmented`](@ref). Unlike [`factorize`](@ref)/[`lu`](@ref) (the +LU/Woodbury throughput path) this never forms `S⁻¹`, so it stays accurate even when the sparse +part `S` is ill-conditioned or nearly singular, as long as `A` itself is nonsingular — the +FastAlmostBandedMatrices regime. + +* `strategy = :auto` (default) or `:augmented`: the augmented bordered-system QR (the only mode). +* `strategy = :woodbury`: **not supported** — a Woodbury-over-qr(S) approach shares the + κ(S)·κ(C) cancellation of the LU Woodbury path and is catastrophically inaccurate on + ill-conditioned `S`, defeating the purpose of using QR. Throws. + +Only `Float64`/`ComplexF64` are supported (SuiteSparseQR's single-precision sparse QR is +unavailable on some Julia versions); use `factorize`/`lu` for `Float32`/`ComplexF32` and +generic eltypes. The factorization has no symbolic-reuse refactor and no +zero-allocation solve (SuiteSparseQR limitations); it is the stability path, not the +throughput path. Solve with `F \\ b` / `ldiv!(F, b)`; update values with [`refactor!`](@ref) / +[`qr!`](@ref) (a full re-`qr`). +""" +function LinearAlgebra.qr(A::SparseWithDenseRowColMatrix; strategy::Symbol = :auto) + strategy in (:auto, :augmented) && return _augmented_qr(A) + strategy === :woodbury && throw( + ArgumentError( + "qr does not support strategy = :woodbury: a Woodbury-over-qr(S) solve shares the \ + κ(S)·κ(C) cancellation of the LU Woodbury path and is inaccurate on ill-conditioned S, \ + defeating the purpose of QR. Use `qr(A; strategy = :augmented)` (the default), or \ + `factorize(A; strategy = :woodbury)` for the LU Woodbury path." + ) + ) + throw(ArgumentError("strategy must be :auto or :augmented for qr; got :$strategy")) end +""" + qr!(F::SparseWithDenseRowColQRAugmented, A::SparseWithDenseRowColMatrix; kwargs...) -> F + +Alias for [`refactor!`](@ref) — re-`qr` `F` in place with the new values of `A`. Note this is a +full re-factorization (SuiteSparseQR has no symbolic reuse), unlike [`lu!`](@ref). +""" +LinearAlgebra.qr!(F::SparseWithDenseRowColQRAugmented, A::SparseWithDenseRowColMatrix; kwargs...) = + refactor!(F, A; kwargs...) + Base.:\(A::SparseWithDenseRowColMatrix, b::AbstractVecOrMat) = factorize(A) \ b """ @@ -97,3 +137,20 @@ It is also the default algorithm `LinearSolve` picks for a `SparseWithDenseRowCo Only available when `LinearSolve` is loaded (provided by a package extension). """ function SparseWithDenseRowColFactorization end + +""" + SparseWithDenseRowColQRFactorization(; check_pattern=true) + +A [LinearSolve.jl](https://github.com/SciML/LinearSolve.jl) algorithm that solves +`SparseWithDenseRowColMatrix` systems with the numerically stable augmented sparse QR +([`SparseWithDenseRowColQRAugmented`](@ref)) through LinearSolve's caching interface. Opt-in +(the default LinearSolve algorithm for the matrix type remains the LU +[`SparseWithDenseRowColFactorization`](@ref)); choose this when `S` is ill-conditioned. + +* `check_pattern` — validate the pattern on each refactor (`false` skips the check). + +There is no `reuse_symbolic` option (SuiteSparseQR has no symbolic-reuse refactor, so a value +update always re-`qr`s), no `strategy` (only the augmented mode exists), and no `refine` (the +augmented QR needs no Woodbury refinement). Only available when `LinearSolve` is loaded. +""" +function SparseWithDenseRowColQRFactorization end diff --git a/src/qr.jl b/src/qr.jl new file mode 100644 index 0000000..cefcc3e --- /dev/null +++ b/src/qr.jl @@ -0,0 +1,204 @@ +# ------------------ +# Augmented QR factorization (numerically stable path) +# ------------------ +# +# Solve A x = b with A = S + U V via the bordered system +# +# [ S U ] [x] [b] +# [ V -I ] [y] = [0] +# +# factored as ONE sparse QR (SuiteSparseQR, `qr(::SparseMatrixCSC)`) of size (n+r). This is +# the numerically stable analogue of the augmented LU path: it never forms `S⁻¹`, so it stays +# accurate even when the sparse part `S` is ill-conditioned or nearly singular but the dense +# rows/columns regularize `A` (the FastAlmostBandedMatrices regime). The bordered system is +# full-rank and well-conditioned whenever `A` itself is nonsingular, regardless of κ(S) — which +# is why a Woodbury-over-qr(S) approach is NOT shipped: it shares the κ(S)·κ(C) cancellation +# and is catastrophically wrong (~1e-1 forward error) once κ(S) ≳ 1e11, while this path holds +# near the noise floor across the whole range. + +""" + SparseWithDenseRowColQRAugmented{T} <: LinearAlgebra.Factorization{T} + +QR factorization of a [`SparseWithDenseRowColMatrix`](@ref) `A = S + U V`, built as a single +sparse QR (SuiteSparseQR) of the bordered system `[S U; V -I]` of size `n + r`. This is the +numerically stable analogue of [`SparseWithDenseRowColAugmented`](@ref) (which uses LU): it +never forms `S⁻¹`, so it stays accurate even when `S` is ill-conditioned or nearly singular, +as long as `A` itself is nonsingular. Created by [`qr`](@ref). + +Only `Float64` and `ComplexF64` are supported; for `Float32`/`ComplexF32` and all generic +eltypes use [`factorize`](@ref)/[`lu`](@ref) (PureKLU). (SuiteSparseQR's single-precision +sparse QR behaves inconsistently across Julia versions — it throws on 1.11 — so the QR path +restricts itself to the double-precision floats, which work identically everywhere.) + +Unlike the LU path this path has no symbolic-reuse refactor (SuiteSparseQR exposes no `qr!`) +and no zero-allocation solve, so it is the *stability* path, not the *throughput* path. The +adjoint/transpose solve is supported via a separately built, lazily cached factorization of +the bordered system's adjoint. +""" +mutable struct SparseWithDenseRowColQRAugmented{T, QF} <: LinearAlgebra.Factorization{T} + Qaug::QF # QRSparse of [S U; V -I] + QaugH::Any # nothing until first adjoint/transpose solve, then a QRSparse + Maug::SparseMatrixCSC{T, Int} # owned bordered matrix (for refactor! and building QaugH) + n::Int + r::Int + rankaug::Int # numerical rank of Qaug (== n+r ⇒ A nonsingular) + rhs::Vector{T} # length n+r — [b; 0] / RHS scratch + rsol::Vector{T} # length n+r — receives `Qaug \ rhs` (QR has no in-place ldiv!) +end + +Base.size(F::SparseWithDenseRowColQRAugmented) = (F.n, F.n) +Base.size(F::SparseWithDenseRowColQRAugmented, i::Integer) = i ≤ 2 ? F.n : 1 +LinearAlgebra.issuccess(F::SparseWithDenseRowColQRAugmented) = F.rankaug == F.n + F.r +denserank(F::SparseWithDenseRowColQRAugmented) = F.r + +# SuiteSparseQR's single-precision support is inconsistent across Julia versions (Float32/ +# ComplexF32 sparse `qr` upcasts on 1.10, *throws* a CHOLMODException on 1.11, and is native on +# 1.12). To give identical behavior on every supported version we restrict the QR path to the +# double-precision BLAS floats, which work everywhere; Float32/ComplexF32 (and all generic +# eltypes) route to PureKLU's LU via `factorize`/`lu`. +_qr_supported(::Type{<:Union{Float64, ComplexF64}}) = true +_qr_supported(::Type) = false + +function _check_qr_eltype(::Type{T}) where {T} + _qr_supported(T) || throw( + ArgumentError( + "SparseWithDenseRowColQRAugmented (SuiteSparseQR) supports only Float64 and ComplexF64; " * + "got eltype $T. Use `factorize`/`lu` (PureKLU) for Float32/ComplexF32 and generic eltypes " * + "(SuiteSparseQR's single-precision sparse QR is unavailable on some Julia versions)." + ) + ) + return nothing +end + +# All `qr` calls below pass `tol = 0.0` to DISABLE SuiteSparseQR's default column deflation +# (`tol = -2` drops columns below ~20·(m+n)·ε·maxcolnorm and writes an exact 0 onto the +# R-diagonal). That default would report a merely ill-conditioned — but nonsingular and +# accurately solvable — bordered system as rank-deficient, making `qr(A)` throw on matrices +# that `factorize`/`klu` solve fine. With deflation off we make the rank decision ourselves +# from the R-diagonal, so the singularity cutoff matches the explicit threshold in `_qr_rank` +# rather than SuiteSparseQR's hidden one. +const _QR_NODEFLATE = 0.0 + +# SuiteSparseQR is rank-revealing; estimate rank from the R-diagonal against the standard +# threshold. `qr` does not throw on a rank-deficient bordered system (unlike `klu`), so this is +# how singularity of A is detected. +# The cutoff is `cond ≳ 1/(m·ε)` (~5e13 for Float64); genuine rank deficiency yields an +# exact-zero pivot and is caught regardless. A truly singular bulk is reported here as a +# `SingularException` (see `_augmented_qr`), matching the LU path's contract. +function _qr_rank(Fq, m::Int) + d = abs.(diag(Fq.R)) + isempty(d) && return 0 + tol = maximum(d) * m * eps(real(eltype(Fq))) + return count(>(tol), d) +end + +function _augmented_qr(A::SparseWithDenseRowColMatrix{T}) where {T} + _check_qr_eltype(T) + n = size(A, 1) + r = denserank(A) + Maug = SparseMatrixCSC{T, Int}(_augmented_matrix(A)) + Qaug = qr(Maug; tol = _QR_NODEFLATE) + rankaug = _qr_rank(Qaug, n + r) + rankaug == n + r || throw(SingularException(0)) # A singular ⇒ bordered system singular + return SparseWithDenseRowColQRAugmented{T, typeof(Qaug)}( + Qaug, nothing, Maug, n, r, rankaug, + Vector{T}(undef, n + r), Vector{T}(undef, n + r) + ) +end + +function LinearAlgebra.ldiv!(F::SparseWithDenseRowColQRAugmented{T}, b::AbstractVector) where {T} + length(b) == F.n || throw(DimensionMismatch()) + rhs = F.rhs + @inbounds copyto!(view(rhs, 1:F.n), b) + @inbounds for i in (F.n + 1):(F.n + F.r) + rhs[i] = zero(T) + end + copyto!(F.rsol, F.Qaug \ rhs) # out-of-place QR solve (allocates the solution) + @inbounds copyto!(b, view(F.rsol, 1:F.n)) + return b +end + +function LinearAlgebra.ldiv!(F::SparseWithDenseRowColQRAugmented, B::AbstractMatrix) + size(B, 1) == F.n || throw(DimensionMismatch()) + for c in axes(B, 2) + ldiv!(F, view(B, :, c)) + end + return B +end + +LinearAlgebra.ldiv!(y::AbstractVector, F::SparseWithDenseRowColQRAugmented, b::AbstractVector) = + ldiv!(F, copyto!(y, b)) +LinearAlgebra.ldiv!(Y::AbstractMatrix, F::SparseWithDenseRowColQRAugmented, B::AbstractMatrix) = + ldiv!(F, copyto!(Y, B)) + +# Adjoint / transpose solve. The bordered matrix Mᴴ = [Sᴴ Vᴴ; Uᴴ -I] has Schur complement +# Sᴴ + Vᴴ Uᴴ = (S + U V)ᴴ = Aᴴ (and Mᵀ ↔ Aᵀ), so solving Mᴴ[x;y]=[b;0] gives x = A⁻ᴴ b. Unlike +# the LU path, SuiteSparseQR has no `Qaug' \ b`, so we build a SEPARATE qr of the adjoint and +# cache it lazily (it is invalidated on every `refactor!`). +function _augfact_adj!(F::SparseWithDenseRowColQRAugmented) + F.QaugH === nothing && (F.QaugH = qr(sparse(F.Maug'); tol = _QR_NODEFLATE)) + return F.QaugH +end +# For real T, transpose == adjoint, so reuse the cached factorization; for complex T the +# (rare) transpose path rebuilds qr(transpose(Maug)) each call rather than holding a 3rd cache. +_augfact_tr!(F::SparseWithDenseRowColQRAugmented{<:Real}) = _augfact_adj!(F) +_augfact_tr!(F::SparseWithDenseRowColQRAugmented) = qr(sparse(transpose(F.Maug)); tol = _QR_NODEFLATE) + +for (Wrap, getfact) in ((AdjointFact, :_augfact_adj!), (TransposeFact, :_augfact_tr!)) + @eval function LinearAlgebra.ldiv!( + Fw::$Wrap{<:Any, <:SparseWithDenseRowColQRAugmented{T}}, b::AbstractVector + ) where {T} + F = parent(Fw) + length(b) == F.n || throw(DimensionMismatch()) + rhs = F.rhs + @inbounds copyto!(view(rhs, 1:F.n), b) + @inbounds for i in (F.n + 1):(F.n + F.r) + rhs[i] = zero(T) + end + copyto!(F.rsol, $getfact(F) \ rhs) + @inbounds copyto!(b, view(F.rsol, 1:F.n)) + return b + end + @eval function LinearAlgebra.ldiv!(Fw::$Wrap{<:Any, <:SparseWithDenseRowColQRAugmented}, B::AbstractMatrix) + for c in axes(B, 2) + ldiv!(Fw, view(B, :, c)) + end + return B + end +end + +# Complex RHS over a real augmented-QR factorization (real/imag split; see woodbury.jl). +LinearAlgebra.ldiv!(F::SparseWithDenseRowColQRAugmented{<:Real}, b::AbstractVector{<:Complex}) = + _ldiv_realfact_complex!(F, b) +LinearAlgebra.ldiv!(Fw::AdjointFact{<:Any, <:SparseWithDenseRowColQRAugmented{<:Real}}, b::AbstractVector{<:Complex}) = + _ldiv_realfact_complex!(Fw, b) +LinearAlgebra.ldiv!(Fw::TransposeFact{<:Any, <:SparseWithDenseRowColQRAugmented{<:Real}}, b::AbstractVector{<:Complex}) = + _ldiv_realfact_complex!(Fw, b) + +""" + refactor!(F::SparseWithDenseRowColQRAugmented, A::SparseWithDenseRowColMatrix; check=true) -> F + +Rebuild the augmented system with the new values of `A` and re-factor. Unlike the LU paths +there is **no symbolic-reuse fast path** — SuiteSparseQR exposes no `qr!`, so this always +re-`qr`s the full bordered system. For hot refactor loops (Newton / time stepping) prefer +[`factorize`](@ref)/[`lu`](@ref), which reuse PureKLU's symbolic analysis; the QR path is the +stability path, not the throughput path. The shape (`n`, `r`) must be unchanged. +""" +function refactor!(F::SparseWithDenseRowColQRAugmented{T}, A::SparseWithDenseRowColMatrix; check::Bool = true) where {T} + denserank(A) == F.r && size(A, 1) == F.n || + throw(DimensionMismatch("shape changed; rebuild with `qr`.")) + F.Maug = SparseMatrixCSC{T, Int}(_augmented_matrix(A)) + F.Qaug = qr(F.Maug; tol = _QR_NODEFLATE) + F.QaugH = nothing # invalidate cached adjoint + F.rankaug = _qr_rank(F.Qaug, F.n + F.r) + check && (F.rankaug == F.n + F.r || throw(SingularException(0))) + return F +end + +# The augmented-QR path embeds U/V inside one factorization and keeps no separate factorization +# of S to reuse, so the Woodbury low-rank fast path does not apply here (same as augmented LU). +function update_lowrank!(::SparseWithDenseRowColQRAugmented; kwargs...) + throw(ArgumentError("update_lowrank! is the Woodbury fast path (it reuses S's factorization); \ + the augmented QR path embeds U/V in one factorization and has no separate S factorization \ + to reuse — update with `refactor!` or rebuild with `qr`.")) +end diff --git a/test/runtests.jl b/test/runtests.jl index fc11674..78949fb 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -13,6 +13,9 @@ using SafeTestsets, Test @safetestset "Augmented fallback" begin include("test_augmented.jl") end + @safetestset "QR factorization (augmented, stable)" begin + include("test_qr.jl") + end @safetestset "Refactorization (Newton path)" begin include("test_refactor.jl") end @@ -38,6 +41,17 @@ using SafeTestsets, Test # structured matrices (`Diagonal`/`Bidiagonal`/`Tridiagonal`/`Triangular`) in # `*`/`mul!` — combinations that never arise in use, and where both candidate # methods compute the same product anyway. - Aqua.test_all(SparseWithDenseRowColMatrices; ambiguities = false, piracies = true) + # + # `persistent_tasks` uses a generous `tmax`: neither this package nor PureKLU spawns any + # task at module load (verified — no `@async`/`Threads.@spawn`/`Timer`/`__init__`; the + # load subprocess exits in ~0.4s), so the check has nothing real to catch. Its default + # 10s wall-clock budget for the spawned subprocess to *exit* is, however, flaky on a + # loaded CI host (heavier first-load on 1.12), producing a false positive under + # precompile contention. The larger budget removes that flake while still flagging a + # genuinely hung load. + Aqua.test_all( + SparseWithDenseRowColMatrices; + ambiguities = false, piracies = true, persistent_tasks = (; tmax = 60), + ) end end diff --git a/test/test_edgecases.jl b/test/test_edgecases.jl index c4f3e38..373e727 100644 --- a/test/test_edgecases.jl +++ b/test/test_edgecases.jl @@ -91,7 +91,10 @@ end @test_throws SingularException factorize(Asing) end -@testset "qr is intentionally unsupported" begin +@testset "qr returns the augmented QR factorization" begin A = rand_sparsedense(20, 2) - @test_throws ArgumentError qr(A) + b = randn(20) + F = qr(A) + @test F isa SparseWithDenseRowColQRAugmented + @test F \ b ≈ Matrix(A) \ b end diff --git a/test/test_linearsolve.jl b/test/test_linearsolve.jl index 5b879e8..5e11817 100644 --- a/test/test_linearsolve.jl +++ b/test/test_linearsolve.jl @@ -65,6 +65,58 @@ end @test solve!(cache).retcode == ReturnCode.Infeasible end +@testset "QR algorithm: solve, cache reuse, and Infeasible parity" begin + A = rand_sparsedense(n, r; seed = 41) + b = randn(n) + cache = init(LinearProblem(A, b), SparseWithDenseRowColQRFactorization()) + s1 = solve!(cache) + @test s1.retcode == ReturnCode.Success + @test s1.u ≈ Matrix(A) \ b + + # new values, same pattern → refactor! re-qrs in the cache + A2 = SparseWithDenseRowColMatrix( + SparseMatrixCSC( + size(A)..., copy(sparsepart(A).colptr), copy(sparsepart(A).rowval), + sparsepart(A).nzval .* 1.5 .+ 0.01 + ), + lowrankfactors(A)[1], Matrix(fillpart(A)) .+ 0.2 + ) + cache.A = A2 + @test solve!(cache).u ≈ Matrix(A2) \ b + # new RHS only + b2 = randn(n) + cache.b = b2 + @test solve!(cache).u ≈ Matrix(A2) \ b2 + + # singular A → Infeasible, no throw (parity with the LU algorithm) + nn = 20 + S = sparse(2.0 * I, nn, nn) + S[5, 5] = 0.0 + Asing = SparseWithDenseRowColMatrix(S, zeros(nn, 1), zeros(1, nn)) + sol = solve(LinearProblem(Asing, randn(nn)), SparseWithDenseRowColQRFactorization()) + @test sol.retcode == ReturnCode.Infeasible +end + +@testset "QR algorithm stays accurate on ill-conditioned S" begin + # Where the default (Woodbury) algorithm would lose accuracy, the opt-in QR algorithm holds. + nn, rr = 120, 3 + S = spdiagm(0 => fill(4.0, nn), 1 => fill(-1.0, nn - 1), -1 => fill(-1.0, nn - 1)) + for i in 1:rr + S[i, :] .*= 1.0e-13 + end + dropzeros!(S) + Fdense = zeros(rr, nn) + for i in 1:rr + Fdense[i, i] = 1.0; Fdense[i, i + rr] = 0.3 + end + A = SparseWithDenseRowColMatrix(S, Fdense; replace = true) + b = randn(nn) + xref = Float64.(big.(Matrix(A)) \ big.(b)) + sol = solve(LinearProblem(A, b), SparseWithDenseRowColQRFactorization()) + @test sol.retcode == ReturnCode.Success + @test norm(sol.u .- xref) / norm(xref) < 1.0e-12 +end + @testset "singular S routes through the augmented fallback under caching" begin # S singular (zeroed top rows), A nonsingular via the dense fill rows nn, rr = 50, 2 diff --git a/test/test_qr.jl b/test/test_qr.jl new file mode 100644 index 0000000..70ed764 --- /dev/null +++ b/test/test_qr.jl @@ -0,0 +1,229 @@ +include("testutils.jl") +using Test, LinearAlgebra, SparseArrays + +# A = S + U V with an ILL-conditioned sparse part S (κ(S) ≈ `kappa`) but a well-conditioned +# assembled A (κ(A) ≈ 6): the top `r` rows of the tridiagonal S are scaled by 1/kappa (making +# S nearly rank-deficient there) and the dense fill replaces exactly those rows with O(1) +# entries. This is the FastAlmostBandedMatrices regime where forming S⁻¹ (Woodbury) loses +# accuracy but the augmented QR — which never forms S⁻¹ — does not. +function illcond_S(n, r, kappa; T = Float64) + S = spdiagm(0 => fill(T(4), n), 1 => fill(T(-1), n - 1), -1 => fill(T(-1), n - 1)) + for i in 1:r + S[i, :] .*= inv(T(kappa)) + end + dropzeros!(S) + F = zeros(T, r, n) + for i in 1:r + F[i, i] = one(T) + F[i, i + r] = T(0.3) + end + return SparseWithDenseRowColMatrix(S, F; replace = true) +end + +# Exact solve of the (rounded) working-precision matrix, computed in BigFloat — the reference +# every forward-error assertion compares against (matches how the design was measured). +function exact_solve(A, b) + x = big.(Matrix(A)) \ big.(b) + return convert.(eltype(b), x) +end + +const QRSparseT = SparseArrays.SPQR.QRSparse + +@testset "well-conditioned: qr agrees with factorize at the noise floor" begin + A = rand_sparsedense(300, 4; seed = 7) + b = randn(300) + xref = exact_solve(A, b) + F = qr(A) + @test F isa SparseWithDenseRowColQRAugmented + @test issuccess(F) + @test size(F) == (300, 300) + @test denserank(F) == 4 + @test relerr(F \ b, xref) < 1.0e-12 + @test relerr(factorize(A) \ b, xref) < 1.0e-10 # both at noise floor +end + +@testset "ill-conditioned S: augmented QR holds where Woodbury degrades" begin + n, r = 200, 4 + b = randn(MersenneTwister(0), n) + # The load-bearing regression guard: across the entire κ(S) sweep — including κ(S) ≥ 3.6e11 + # where Woodbury (even with refinement) loses accuracy — the augmented QR stays ≤ 1e-13. + for kappa in (3.6e6, 3.6e10, 3.6e11, 3.6e12, 3.6e14) + A = illcond_S(n, r, kappa) + xref = exact_solve(A, b) + F = qr(A) + @test issuccess(F) + @test relerr(F \ b, xref) < 1.0e-13 + end + # And at the top of the sweep the QR is decisively better than the default Woodbury path. + A = illcond_S(n, r, 3.6e14) + xref = exact_solve(A, b) + err_qr = relerr(qr(A) \ b, xref) + err_wood = relerr(factorize(A; strategy = :woodbury, auto_fallback = false, refine = 2) \ b, xref) + @test err_qr < 1.0e-13 + @test err_wood > 1.0e3 * err_qr +end + +@testset "strategy handling and no Woodbury-QR footgun" begin + A = rand_sparsedense(40, 2; seed = 2) + @test qr(A; strategy = :auto) isa SparseWithDenseRowColQRAugmented + @test qr(A; strategy = :augmented) isa SparseWithDenseRowColQRAugmented + @test_throws ArgumentError qr(A; strategy = :woodbury) + @test_throws ArgumentError qr(A; strategy = :bogus) + # The Woodbury-over-qr(S) mode is intentionally not shipped (shares the κ(S)·κ(C) + # cancellation): there must be no such type to silently re-introduce it. + @test !isdefined(SparseWithDenseRowColMatrices, :SparseWithDenseRowColQR) +end + +@testset "eltypes: Float64/ComplexF64 succeed, others error (not silent Float64)" begin + for T in (Float64, ComplexF64) + A = rand_sparsedense(50, 3; seed = 4, T = T) + b = T <: Complex ? (randn(50) .+ im .* randn(50)) .|> T : randn(T, 50) + F = qr(A) + @test F isa SparseWithDenseRowColQRAugmented{T} + @test relerr(F \ b, exact_solve(A, b)) < 1.0e-11 + end + # Float32/ComplexF32 are intentionally unsupported (SuiteSparseQR single precision is + # version-dependent — it throws on Julia 1.11); they must error, not silently upcast. + for T in (Float32, ComplexF32, Rational{Int}, Int, BigFloat) + # build an exact-typed matrix (randn→Int would itself throw before qr is reached) + nn = 20 + S = spdiagm(0 => fill(T(4), nn), 1 => fill(T(-1), nn - 1), -1 => fill(T(-1), nn - 1)) + U = zeros(T, nn, 2); U[1, 1] = one(T); U[2, 2] = one(T) + V = zeros(T, 2, nn); V[1, 1] = one(T); V[2, 2] = one(T) + A = SparseWithDenseRowColMatrix(S, U, V) + @test_throws ArgumentError qr(A) # must error, not return a Float64 factorization + end +end + +@testset "adjoint & transpose solve (lazy cached QaugH)" begin + A = rand_sparsedense(120, 3; seed = 11) + M = Matrix(A) + b = randn(120) + F = qr(A) + @test F.QaugH === nothing # not built until first adjoint/transpose solve + @test relerr(F' \ b, M' \ b) < 1.0e-11 + @test F.QaugH isa QRSparseT # lazily built + cached = F.QaugH + @test relerr(F' \ b, M' \ b) < 1.0e-11 + @test F.QaugH === cached # second adjoint solve reuses it (no rebuild) + @test relerr(transpose(F) \ b, transpose(M) \ b) < 1.0e-11 + x = copy(b) + ldiv!(F', x) + @test relerr(x, M' \ b) < 1.0e-11 + # complex A: adjoint ≠ transpose + Ac = rand_sparsedense(80, 2; seed = 12, T = ComplexF64) + Mc = Matrix(Ac) + bc = randn(ComplexF64, 80) + Fc = qr(Ac) + @test relerr(Fc' \ bc, Mc' \ bc) < 1.0e-10 + @test relerr(transpose(Fc) \ bc, transpose(Mc) \ bc) < 1.0e-10 +end + +@testset "complex RHS over a real factorization" begin + A = rand_sparsedense(60, 2; seed = 13) + M = Matrix(A) + b = randn(60) .+ im .* randn(60) + F = qr(A) + @test relerr(F \ b, M \ b) < 1.0e-11 + @test relerr(F' \ b, M' \ b) < 1.0e-11 +end + +@testset "matrix RHS and 3-arg ldiv!" begin + A = rand_sparsedense(90, 3; seed = 14) + M = Matrix(A) + B = randn(90, 5) + F = qr(A) + @test relerr(F \ B, M \ B) < 1.0e-11 + Y = similar(B) + ldiv!(Y, F, B) + @test relerr(Y, M \ B) < 1.0e-11 +end + +@testset "refactor! / qr! (full re-qr, no symbolic reuse)" begin + A1 = rand_sparsedense(100, 3; seed = 21) + A2 = rand_sparsedense(100, 3; seed = 22) # same pattern, new values + b = randn(100) + F = qr(A1) + _ = F' \ b # force QaugH to be built + @test F.QaugH isa QRSparseT + refactor!(F, A2) + @test F.QaugH === nothing # adjoint cache invalidated on refactor + @test relerr(F \ b, Matrix(A2) \ b) < 1.0e-11 + # qr! is an alias for refactor! + A3 = rand_sparsedense(100, 3; seed = 23) + qr!(F, A3) + @test relerr(F \ b, Matrix(A3) \ b) < 1.0e-11 + # a shape change must be rejected + Abig = rand_sparsedense(101, 3; seed = 24) + @test_throws DimensionMismatch refactor!(F, Abig) + Arank = rand_sparsedense(100, 4; seed = 25) + @test_throws DimensionMismatch refactor!(F, Arank) +end + +@testset "singularity: singular A throws, ill-conditioned S succeeds" begin + # A genuinely singular A (a zero column the dense fill does not supply) ⇒ the bordered + # system is rank-deficient ⇒ SingularException, matching the LU path's contract. + n, r = 40, 2 + S = spdiagm(0 => fill(3.0, n), -1 => fill(-1.0, n - 1)) + S[:, n] .= 0.0 + S[n, :] .= 0.0 + dropzeros!(S) + F = zeros(r, n) + for i in 1:r + F[i, i] = 1.0 # dense rows do not touch column n ⇒ A column n is all zero + end + Asing = SparseWithDenseRowColMatrix(S, F; replace = true) + @test_throws SingularException qr(Asing) + @test_throws SingularException factorize(Asing; strategy = :augmented) # LU agrees + + # Ill-conditioned S but nonsingular A must SUCCEED (the whole point of the path). + Aok = illcond_S(80, 3, 3.6e13) + Fok = qr(Aok) + @test issuccess(Fok) + b = randn(80) + @test relerr(Fok \ b, exact_solve(Aok, b)) < 1.0e-13 +end + +@testset "ill-conditioned ASSEMBLED A does not spuriously throw (SPQR deflation off)" begin + # Regression: SuiteSparseQR's default `tol = -2` deflates ill-conditioned-but-nonsingular + # columns and would make qr(A) throw SingularException on matrices that factorize/klu solve + # fine. With deflation disabled, an assembled A up to cond ~1e13 must factor and solve to + # roughly its conditioning-limited accuracy (κ(A)·eps), matching the LU path's behavior. + function illcond_assembled(n; kappa) + d = collect(exp.(range(0, -log(kappa); length = n))) # graded 1 .. 1/kappa + S = spdiagm(0 => d) + U = zeros(n, 1); U[1, 1] = 1.0e-8 + V = zeros(1, n); V[1, 1] = 1.0e-8 # negligible low-rank part + return SparseWithDenseRowColMatrix(S, U, V) + end + b = randn(MersenneTwister(0), 120) + for kappa in (1.0e10, 1.0e12, 1.0e13) + A = illcond_assembled(120; kappa = kappa) + F = qr(A) # must NOT throw + @test issuccess(F) + xref = Float64.(big.(Matrix(A)) \ big.(b)) + # backward-stable: forward error tracks κ(A)·eps, not the noise floor + @test relerr(F \ b, xref) < 1.0e3 * cond(Matrix(A)) * eps() + # the augmented-LU path also solves these (contract parity — neither throws) + @test issuccess(factorize(A; strategy = :augmented)) + end +end + +@testset "update_lowrank! is not supported on the QR path" begin + A = rand_sparsedense(30, 2; seed = 31) + F = qr(A) + @test_throws ArgumentError update_lowrank!(F) +end + +@testset "r == 0 edge case reduces to qr(S)" begin + n = 50 + S = spdiagm(0 => fill(4.0, n), 1 => fill(-1.0, n - 1), -1 => fill(-1.0, n - 1)) + A = SparseWithDenseRowColMatrix(S, zeros(0, n)) # no dense rows/cols + @test denserank(A) == 0 + F = qr(A) + @test denserank(F) == 0 + @test issuccess(F) + b = randn(n) + @test relerr(F \ b, Matrix(S) \ b) < 1.0e-11 + @test relerr(F' \ b, Matrix(S)' \ b) < 1.0e-11 +end