Skip to content

Commit 1605fb3

Browse files
committed
.rst to .md for docs and DRY for readme/docs
1 parent 36d2df8 commit 1605fb3

23 files changed

Lines changed: 1124 additions & 1195 deletions

.github/workflows/python-publish.yml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,38 @@ on:
66
workflow_dispatch:
77

88
jobs:
9+
sync-citation:
10+
name: Sync citation.cff with package version
11+
runs-on: ubuntu-latest
12+
13+
steps:
14+
- uses: actions/checkout@v4
15+
with:
16+
fetch-depth: 0
17+
token: ${{ secrets.GITHUB_TOKEN }}
18+
19+
- name: Sync citation
20+
uses: gojiplus/citation-sync@main
21+
with:
22+
citation_path: CITATION.cff
23+
package_path: pyproject.toml
24+
25+
- name: Commit and push if changed
26+
run: |
27+
git config --local user.email "action@github.com"
28+
git config --local user.name "GitHub Action"
29+
git add CITATION.cff
30+
if git diff --staged --quiet; then
31+
echo "No changes to commit"
32+
else
33+
git commit -m "Update citation.cff version"
34+
git push
35+
fi
36+
937
build:
1038
name: Build distribution
39+
needs:
40+
- sync-citation
1141
runs-on: ubuntu-latest
1242

1343
steps:

README.md

Lines changed: 12 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -6,39 +6,13 @@
66
[![License](https://img.shields.io/badge/dynamic/toml?url=https://raw.githubusercontent.com/finite-sample/alsgls/main/pyproject.toml&query=$.project.license.text&label=License)](https://opensource.org/licenses/MIT)
77

88

9-
When a GLS problem involves hundreds of equations, the $K × K$ covariance matrix becomes the computational bottleneck. A simple statistical remedy is to assume that most of the cross‑equation dependence can be captured by a *handful of latent factors* plus equation‑specific noise. This “low‑rank + diagonal” assumption slashes the number of unknowns from roughly $K^²$ to about $K×k$ parameters, where **k** (the latent factor rank) is much smaller than $K$. The model alone, however, does **not** guarantee speed: we still have to fit the parameters.
10-
11-
### Installation
12-
13-
Install the library from PyPI:
14-
15-
```bash
16-
pip install alsgls
9+
```{include} docs/source/_snippets/synopsis.md
1710
```
1811

19-
For local development, clone the repo and use an editable install:
20-
21-
```bash
22-
pip install -e .
12+
```{include} docs/source/_snippets/installation.md
2313
```
2414

25-
### Usage
26-
27-
```python
28-
from alsgls import ALSGLS, ALSGLSSystem, simulate_sur
29-
30-
Xs_tr, Y_tr, Xs_te, Y_te = simulate_sur(N_tr=240, N_te=120, K=60, p=3, k=4)
31-
32-
# Scikit-learn style estimator
33-
est = ALSGLS(rank="auto", max_sweeps=12)
34-
est.fit(Xs_tr, Y_tr)
35-
test_score = est.score(Xs_te, Y_te) # negative test NLL per observation
36-
37-
# Statsmodels-style system interface
38-
system = {f"eq{j}": (Y_tr[:, j], Xs_tr[j]) for j in range(Y_tr.shape[1])}
39-
sys_model = ALSGLSSystem(system, rank="auto")
40-
sys_results = sys_model.fit()
41-
params = sys_results.params_as_series() # pandas optional
15+
```{include} docs/source/_snippets/basic_usage.md
4216
```
4317

4418
See `examples/compare_als_vs_em.py` for a complete ALS versus EM comparison. The
@@ -50,29 +24,26 @@ peak memory (via Memray, Fil, or the POSIX RSS high-water mark).
5024

5125
Background material and reproducible experiments are available in the notebooks under [`als_sim/`](als_sim/), such as [`als_sim/als_comparison.ipynb`](als_sim/als_comparison.ipynb) and [`als_sim/als_sur.ipynb`](als_sim/als_sur.ipynb).
5226

53-
### Solving low‑rank GLS: EM versus ALS
27+
### Solving low‑rank GLS: EM versus ALS
5428

55-
The classic EM algorithm alternates between updating the regression coefficients $\beta$ and updating the factor loadings $F$ and the diagonal noise $D$. Even though $\hat{\Sigma}$ is low‑rank, EMs M‑step recreates the **full** $K × K$ inverse, wiping out the memory win.
29+
The classic EM algorithm alternates between updating the regression coefficients $\beta$ and updating the factor loadings $F$ and the diagonal noise $D$. Even though $\hat{\Sigma}$ is low‑rank, EM's M‑step recreates the **full** $K × K$ inverse, wiping out the memory win.
5630

57-
An alternative is **Alternating‑Least‑Squares (ALS)**. The Woodbury identity reduces the expensive inverse to a tiny k × k system, and the β‑update can be written without explicitly forming the dense matrix at all. In practice, ALS converges in 5–6 sweeps and never allocates more than $O(Kk)$ memory, while EM allocates $O(K^²)$.
31+
An alternative is **Alternating‑Least‑Squares (ALS)**. The Woodbury identity reduces the expensive inverse to a tiny k × k system, and the β‑update can be written without explicitly forming the dense matrix at all. In practice, ALS converges in 5–6 sweeps and never allocates more than $O(K k)$ memory, while EM allocates $O(K^²)$.
5832

5933
**Rule of thumb:** if your GLS routine keeps looping between $\beta$ and a fresh $\hat{\Sigma}$, replacing the $\hat{\Sigma}$‑update by a factor‑ALS step yields the same statistical fit with an order‑of‑magnitude smaller memory footprint.
6034

6135
### Beyond SUR: where the idea travels
6236

63-
Random‑effects models, feasible GLS with estimated heteroskedastic weights, optimal‑weight GMM, and spatial autoregressive GLS all iterate β ↔ Σ̂. Each can adopt the same ALS trick: treat the weight matrix as low‑rank + diagonal, invert only the k × k core, and avoid the dense K × K algebra. Memory savings in published examples range from 5× to 20×, depending on k.
37+
Random‑effects models, feasible GLS with estimated heteroskedastic weights, optimal‑weight GMM, and spatial autoregressive GLS all iterate βΣ̂. Each can adopt the same ALS trick: treat the weight matrix as low‑rank + diagonal, invert only the k × k core, and avoid the dense K × K algebra. Memory savings in published examples range from 5× to 20×, depending on k.
6438

6539
### A concrete case‑study: Seemingly‑Unrelated Regressions
6640

67-
To show the magnitude, we ran a Monte‑Carlo experiment with N = 300 observations, three regressors, rank‑3 factors, and K set to 50,80,120. EM was given 45 iterations; ALS, six sweeps. The largest array EM holds is the dense Σ⁻¹, whereas ALSs largest is the skinny factor matrix F. The table summarises six replications:
41+
To show the magnitude, we ran a Monte‑Carlo experiment with N = 300 observations, three regressors, rank‑3 factors, and K set to 50, 80, 120. EM was given 45 iterations; ALS, six sweeps. The largest array EM holds is the dense Σ⁻¹, whereas ALS's largest is the skinny factor matrix F. The table summarises six replications:
6842

69-
| K | β‑RMSE EM | β‑RMSE ALS | Peak MB EM | Peak MB ALS | Memory ratio |
70-
| --: | :-------: | :--------: | ---------: | ----------: | -----------: |
71-
| 50 |  0.021  |  0.021  |  0.020  |  0.002  |  10×  |
72-
| 80 |  0.020  |  0.020  |  0.051  |  0.003  |  17×  |
73-
| 120 |  0.020  |  0.020  |  0.115  |  0.004  |  29×  |
43+
```{include} docs/source/_snippets/performance_table.md
44+
```
7445

75-
Statistically, the two estimators are indistinguishable (paired‑test p ≥ 0.14). Computationally, ALS needs only a few megabytes whereas EM needs tens to hundreds.
46+
Statistically, the two estimators are indistinguishable (paired‑test p0.14). Computationally, ALS needs only a few megabytes whereas EM needs tens to hundreds.
7647

7748
### Defaults, tuning knobs, and failure modes
7849

@@ -93,5 +64,4 @@ Statistically, the two estimators are indistinguishable (paired‑test p ≥ 0
9364
can make the β-step CG solve slow; adjust `cg_tol`/`cg_maxit`, add stronger
9465
ridge, or re-scale predictors. If `info["accept_t"]` stays at zero and the
9566
NLL does not improve, the factor rank may be too large relative to the sample
96-
size.
97-
67+
size.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
## Usage
2+
3+
```python
4+
from alsgls import ALSGLS, ALSGLSSystem, simulate_sur
5+
6+
Xs_tr, Y_tr, Xs_te, Y_te = simulate_sur(N_tr=240, N_te=120, K=60, p=3, k=4)
7+
8+
# Scikit-learn style estimator
9+
est = ALSGLS(rank="auto", max_sweeps=12)
10+
est.fit(Xs_tr, Y_tr)
11+
test_score = est.score(Xs_te, Y_te) # negative test NLL per observation
12+
13+
# Statsmodels-style system interface
14+
system = {f"eq{j}": (Y_tr[:, j], Xs_tr[j]) for j in range(Y_tr.shape[1])}
15+
sys_model = ALSGLSSystem(system, rank="auto")
16+
sys_results = sys_model.fit()
17+
params = sys_results.params_as_series() # pandas optional
18+
```
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
## Installation
2+
3+
Install the library from PyPI:
4+
5+
```bash
6+
pip install alsgls
7+
```
8+
9+
For local development, clone the repo and use an editable install:
10+
11+
```bash
12+
pip install -e .
13+
```
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
| K | β‑RMSE EM | β‑RMSE ALS | Peak MB EM | Peak MB ALS | Memory ratio |
2+
| --: | :-------: | :--------: | ---------: | ----------: | -----------: |
3+
| 50 | 0.021 | 0.021 | 0.020 | 0.002 | 10× |
4+
| 80 | 0.020 | 0.020 | 0.051 | 0.003 | 17× |
5+
| 120 | 0.020 | 0.020 | 0.115 | 0.004 | 29× |

docs/source/_snippets/synopsis.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
When a GLS problem involves hundreds of equations, the $K × K$ covariance matrix becomes the computational bottleneck. A simple statistical remedy is to assume that most of the cross‑equation dependence can be captured by a *handful of latent factors* plus equation‑specific noise. This "low‑rank + diagonal" assumption slashes the number of unknowns from roughly $K^²$ to about $K×k$ parameters, where **k** (the latent factor rank) is much smaller than $K$. The model alone, however, does **not** guarantee speed: we still have to fit the parameters.

docs/source/als_vs_em.md

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# ALS vs EM Comparison
2+
3+
This section compares the Alternating Least Squares (ALS) and Expectation-Maximization (EM)
4+
approaches for low-rank+diagonal GLS estimation.
5+
6+
## Mathematical Background
7+
8+
Both algorithms solve the same statistical problem: estimating regression coefficients β
9+
when the error covariance has a low-rank+diagonal structure:
10+
11+
Σ = FF^T + diag(D)
12+
13+
where F is K×k factor loadings and D is K×1 diagonal noise.
14+
15+
## EM Algorithm
16+
17+
The EM algorithm alternates between:
18+
19+
**E-step**: Compute expected sufficient statistics given current parameters
20+
**M-step**: Update parameters by solving weighted least squares with full Σ^(-1)
21+
22+
The critical issue: EM's M-step explicitly forms the K×K inverse covariance matrix,
23+
requiring O(K²) memory even though the model has only O(Kk) parameters.
24+
25+
```python
26+
# EM's expensive step (pseudocode)
27+
Sigma = F @ F.T + np.diag(D) # K×K dense matrix
28+
Sigma_inv = np.linalg.inv(Sigma) # K×K inversion
29+
# Use Sigma_inv for β updates...
30+
```
31+
32+
## ALS Algorithm
33+
34+
ALS also alternates between updating β and updating (F, D), but uses the Woodbury
35+
matrix identity to avoid forming dense matrices:
36+
37+
Σ^(-1) = D^(-1) - D^(-1)F(I + F^T D^(-1)F)^(-1)F^T D^(-1)
38+
39+
The key insight: we only need to invert a k×k matrix (I + F^T D^(-1)F), not K×K.
40+
41+
```python
42+
# ALS's efficient step (pseudocode)
43+
D_inv = 1.0 / D # K×1 vector
44+
small_inv = np.linalg.inv(np.eye(k) + F.T @ (D_inv[:, None] * F)) # k×k
45+
# Apply Woodbury formula without forming K×K matrices
46+
```
47+
48+
## Memory Comparison
49+
50+
| Algorithm | Largest Array | Memory Complexity |
51+
|-----------|---------------|------------------|
52+
| EM | Σ^(-1) (K×K dense) | O(K²) |
53+
| ALS | F (K×k skinny) | O(Kk) |
54+
55+
For K=100 equations and k=5 factors:
56+
- EM: 100×100 = 10,000 floats
57+
- ALS: 100×5 = 500 floats
58+
- **Ratio: 20×**
59+
60+
## Computational Comparison
61+
62+
```python
63+
from alsgls import simulate_sur, als_gls, em_gls
64+
import time
65+
66+
# Generate test problem
67+
K = 100 # equations
68+
N = 300 # observations
69+
k = 5 # factors
70+
Xs_tr, Y_tr, _, _ = simulate_sur(N_tr=N, N_te=50, K=K, p=3, k=k)
71+
72+
# Time ALS
73+
t0 = time.time()
74+
B_als, F_als, D_als, mem_als, info_als = als_gls(
75+
Xs_tr, Y_tr, k=k, sweeps=8
76+
)
77+
time_als = time.time() - t0
78+
79+
# Time EM
80+
t0 = time.time()
81+
B_em, F_em, D_em, mem_em, info_em = em_gls(
82+
Xs_tr, Y_tr, k=k, iters=30
83+
)
84+
time_em = time.time() - t0
85+
86+
print(f"ALS: {time_als:.2f}s, {mem_als:.1f}MB")
87+
print(f"EM: {time_em:.2f}s, {mem_em:.1f}MB")
88+
print(f"Memory ratio: {mem_em/mem_als:.1f}×")
89+
```
90+
91+
## Statistical Equivalence
92+
93+
Despite different computational approaches, both algorithms optimize the same
94+
likelihood and produce statistically indistinguishable estimates:
95+
96+
```python
97+
from alsgls import XB_from_Blist, mse
98+
import numpy as np
99+
100+
# Compare predictions
101+
Y_pred_als = XB_from_Blist(Xs_te, B_als)
102+
Y_pred_em = XB_from_Blist(Xs_te, B_em)
103+
104+
# MSE should be nearly identical
105+
mse_als = mse(Y_te, Y_pred_als)
106+
mse_em = mse(Y_te, Y_pred_em)
107+
108+
print(f"ALS MSE: {mse_als:.6f}")
109+
print(f"EM MSE: {mse_em:.6f}")
110+
print(f"Difference: {abs(mse_als - mse_em):.2e}")
111+
112+
# Factor structures should be equivalent (up to rotation)
113+
cov_als = F_als @ F_als.T + np.diag(D_als)
114+
cov_em = F_em @ F_em.T + np.diag(D_em)
115+
print(f"Covariance difference: {np.linalg.norm(cov_als - cov_em):.2e}")
116+
```
117+
118+
## When to Use Each
119+
120+
**Use ALS when:**
121+
- K is large (>50 equations)
122+
- Memory is constrained
123+
- You need the Woodbury form for other computations
124+
- k << K (low-rank assumption holds)
125+
126+
**Use EM when:**
127+
- K is small (<30 equations)
128+
- You need the full covariance matrix anyway
129+
- Implementing a standard EM framework
130+
- Debugging (EM is conceptually simpler)
131+
132+
## Implementation Details
133+
134+
The package provides both for comparison:
135+
136+
- `als_gls()`: Memory-efficient ALS implementation
137+
- `em_gls()`: Baseline EM implementation
138+
139+
Both use the same convergence criteria and regularization options for fair comparison.

0 commit comments

Comments
 (0)