You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a GLS problem involves hundreds of equations, the $K × K$ covariance matrix becomes the computational bottleneck. A simple statistical remedy is to assume that most of the cross‑equation dependence can be captured by a *handful of latent factors* plus equation‑specific noise. This “low‑rank + diagonal” assumption slashes the number of unknowns from roughly $K^²$ to about $K×k$ parameters, where **k** (the latent factor rank) is much smaller than $K$. The model alone, however, does **not** guarantee speed: we still have to fit the parameters.
10
-
11
-
### Installation
12
-
13
-
Install the library from PyPI:
14
-
15
-
```bash
16
-
pip install alsgls
9
+
```{include} docs/source/_snippets/synopsis.md
17
10
```
18
11
19
-
For local development, clone the repo and use an editable install:
See `examples/compare_als_vs_em.py` for a complete ALS versus EM comparison. The
@@ -50,29 +24,26 @@ peak memory (via Memray, Fil, or the POSIX RSS high-water mark).
50
24
51
25
Background material and reproducible experiments are available in the notebooks under [`als_sim/`](als_sim/), such as [`als_sim/als_comparison.ipynb`](als_sim/als_comparison.ipynb) and [`als_sim/als_sur.ipynb`](als_sim/als_sur.ipynb).
52
26
53
-
###Solving low‑rank GLS: EM versus ALS
27
+
###Solving low‑rank GLS: EM versus ALS
54
28
55
-
The classic EM algorithm alternates between updating the regression coefficients $\beta$ and updating the factor loadings $F$ and the diagonal noise $D$. Even though $\hat{\Sigma}$ is low‑rank, EM’s M‑step recreates the **full** $K × K$ inverse, wiping out the memory win.
29
+
The classic EM algorithm alternates between updating the regression coefficients $\beta$ and updating the factor loadings $F$ and the diagonal noise $D$. Even though $\hat{\Sigma}$ is low‑rank, EM's M‑step recreates the **full** $K × K$ inverse, wiping out the memory win.
56
30
57
-
An alternative is **Alternating‑Least‑Squares (ALS)**. The Woodbury identity reduces the expensive inverse to a tiny k × k system, and the β‑update can be written without explicitly forming the dense matrix at all. In practice, ALS converges in 5–6 sweeps and never allocates more than $O(Kk)$ memory, while EM allocates $O(K^²)$.
31
+
An alternative is **Alternating‑Least‑Squares (ALS)**. The Woodbury identity reduces the expensive inverse to a tiny k × k system, and the β‑update can be written without explicitly forming the dense matrix at all. In practice, ALS converges in 5–6 sweeps and never allocates more than $O(Kk)$ memory, while EM allocates $O(K^²)$.
58
32
59
33
**Rule of thumb:** if your GLS routine keeps looping between $\beta$ and a fresh $\hat{\Sigma}$, replacing the $\hat{\Sigma}$‑update by a factor‑ALS step yields the same statistical fit with an order‑of‑magnitude smaller memory footprint.
60
34
61
35
### Beyond SUR: where the idea travels
62
36
63
-
Random‑effects models, feasible GLS with estimated heteroskedastic weights, optimal‑weight GMM, and spatial autoregressive GLS all iterate β ↔ Σ̂. Each can adopt the same ALS trick: treat the weight matrix as low‑rank + diagonal, invert only the k × k core, and avoid the dense K × K algebra. Memory savings in published examples range from 5× to 20×, depending on k.
37
+
Random‑effects models, feasible GLS with estimated heteroskedastic weights, optimal‑weight GMM, and spatial autoregressive GLS all iterate β ↔ Σ̂. Each can adopt the same ALS trick: treat the weight matrix as low‑rank + diagonal, invert only the k × k core, and avoid the dense K × K algebra. Memory savings in published examples range from 5× to 20×, depending on k.
64
38
65
39
### A concrete case‑study: Seemingly‑Unrelated Regressions
66
40
67
-
To show the magnitude, we ran a Monte‑Carlo experiment with N = 300 observations, three regressors, rank‑3 factors, and K set to 50,80,120. EM was given 45 iterations; ALS, six sweeps. The largest array EM holds is the dense Σ⁻¹, whereas ALS’s largest is the skinny factor matrixF. The table summarises six replications:
41
+
To show the magnitude, we ran a Monte‑Carlo experiment with N = 300 observations, three regressors, rank‑3 factors, and K set to 50,80,120. EM was given 45 iterations; ALS, six sweeps. The largest array EM holds is the dense Σ⁻¹, whereas ALS's largest is the skinny factor matrixF. The table summarises six replications:
68
42
69
-
| K | β‑RMSE EM | β‑RMSE ALS | Peak MB EM | Peak MB ALS | Memory ratio |
Statistically, the two estimators are indistinguishable (paired‑test p ≥ 0.14). Computationally, ALS needs only a few megabytes whereas EM needs tens to hundreds.
46
+
Statistically, the two estimators are indistinguishable (paired‑test p ≥ 0.14). Computationally, ALS needs only a few megabytes whereas EM needs tens to hundreds.
76
47
77
48
### Defaults, tuning knobs, and failure modes
78
49
@@ -93,5 +64,4 @@ Statistically, the two estimators are indistinguishable (paired‑test p ≥ 0
93
64
can make the β-step CG solve slow; adjust `cg_tol`/`cg_maxit`, add stronger
94
65
ridge, or re-scale predictors. If `info["accept_t"]` stays at zero and the
95
66
NLL does not improve, the factor rank may be too large relative to the sample
When a GLS problem involves hundreds of equations, the $K × K$ covariance matrix becomes the computational bottleneck. A simple statistical remedy is to assume that most of the cross‑equation dependence can be captured by a *handful of latent factors* plus equation‑specific noise. This "low‑rank + diagonal" assumption slashes the number of unknowns from roughly $K^²$ to about $K×k$ parameters, where **k** (the latent factor rank) is much smaller than $K$. The model alone, however, does **not** guarantee speed: we still have to fit the parameters.
0 commit comments