Skip to content

One-shot structured lstsq over-allocates — post-hoc check matvecs and pivoted-QR Q materialization dominate #15

@ChrisRackauckas-Claude

Description

@ChrisRackauckas-Claude

WHAT: _structured_setup allocates 7.0 MB at n=5000/r=8 (linear in n). Per-step hotspots: pivoted QR qr(hcat(Z,Vadj), ColumnNorm()) + Matrix(QB.Q) (lstsq.jl:222-223) = 3.99 MB, hcat = 0.64 MB, Matrix(V') = 0.32 MB (klu itself = 2.37 MB, irreducible). The one-shot wrapper additionally runs a post-hoc optimality check (lstsq.jl:288-292) using out-of-place A*x, A'*res, A'*collect(b) = ~15 MB extra per call. Note: the input is tall (n×2r) so Matrix(QB.Q) is already n×2r, not n×n — densifying-Q is not the issue. WHY IT MATTERS: Perf, not correctness; matters for the one-shot path at large n. FIX: (1) Replace the post-hoc check's out-of-place matvecs with preallocated buffers + the in-place 5-arg mul! already in src/matvec.jl; skip collect(TT,b) when b is already Vector{TT}. (2) Write Z and Vadj into one preallocated n×2r buffer instead of hcat. (3) Use LAPACK.geqp3! + thin-Q application (ormqr) instead of Matrix(QB.Q) when only W=Q[:,1:s] is needed. EFFORT: M.


Priority: medium. Filed from an automated next-steps audit of the QR/lstsq work (see PR #6).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions