Skip to content
Open
2 changes: 2 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,11 @@ Suggests:
ggdag,
ggplot2,
glmx,
gratia (>= 0.9),
haven,
kableExtra,
KMsurv,
mgcv,
muhaz,
palmerpenguins,
parameters,
Expand Down
1 change: 1 addition & 0 deletions _quarto-book.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ book:
- chapters/causal-inference.qmd
- chapters/predictor-selection.qmd
- chapters/intro-multilevel-models.qmd
- chapters/generalized-additive-models.qmd
- part: chapters/time-to-event-models.qmd
chapters:
- chapters/intro-to-survival-analysis.qmd
Expand Down
3 changes: 3 additions & 0 deletions _quarto-website.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ project:
- chapters/causal-inference.qmd
- chapters/predictor-selection.qmd
- chapters/intro-multilevel-models.qmd
- chapters/generalized-additive-models.qmd
- chapters/time-to-event-models.qmd
- chapters/intro-to-survival-analysis.qmd
- chapters/proportional-hazards-models.qmd
Expand Down Expand Up @@ -77,6 +78,8 @@ website:
href: chapters/predictor-selection.qmd
- text: "Multilevel Models"
href: chapters/intro-multilevel-models.qmd
- text: "Generalized Additive Models"
href: chapters/generalized-additive-models.qmd
- text: "---"
- text: "Time to Event Models"
href: chapters/time-to-event-models.qmd
Expand Down
160 changes: 160 additions & 0 deletions _subfiles/generalized-additive-models/_exr-gams-applied.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
::: notes
Applied exercises adapted from @james2021islr2e [§7.9] and
@hastie2009esl2e [§9.1].
:::

### Polynomial regression and step functions (ISLRv2 Applied 7) {#sec-exr-wage-poly}

::::{#exr-wage-polynomial-cv}

Using the `Wage` dataset from the `ISLR2` package [@james2021islr2e]:

**(a)** Use 10-fold cross-validation to choose the polynomial degree $d^*$ for
predicting `wage` from `age`. Plot the CV errors for degrees 1 through 10.

**(b)** Fit the degree-$d^*$ polynomial and plot it with a 95% confidence band.

**(c)** Separately, choose the number of cut-points $K^*$ for a step-function
model using cross-validation. Plot the chosen step-function fit.

::::


::::{#sol-wage-polynomial-cv}

::: {.panel-tabset}

#### R

```{r}
#| code-fold: true
#| eval: false
library(ISLR2)
library(boot)
library(ggplot2)
set.seed(1)
cv_poly <- vapply(1:10, function(d) {
cv.glm(Wage, glm(wage ~ poly(age, d), data = Wage), K = 10)$delta[1]
}, numeric(1))
best_d <- which.min(cv_poly)
cat("CV-selected degree:", best_d, "\n")
plot(1:10, cv_poly, type = "b", xlab = "Degree", ylab = "CV MSE",
main = "Polynomial degree selection")
points(best_d, cv_poly[best_d], col = "red", pch = 19)
```

#### Python

```{python}
#| eval: false
#| python.reticulate: false
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
from ISLP import load_data
Wage = load_data("Wage")
X = Wage["age"].values.reshape(-1, 1)
y = Wage["wage"].values
cv_errs = [-cross_val_score(
Pipeline([("poly", PolynomialFeatures(d, include_bias=False)),
("reg", LinearRegression())]),
X, y, cv=10, scoring="neg_mean_squared_error").mean() for d in range(1, 11)]
best_d = int(np.argmin(cv_errs)) + 1
print(f"CV-selected degree: {best_d}")
```

:::

::::


### Backfitting algorithm (ESL §9.1.1 / ISLRv2 Applied 11) {#sec-exr-backfitting}

::::{#exr-backfitting}

The **backfitting algorithm** [@hastie2009esl2e, §9.1.1] fits an additive model
$Y = \a + \sum_j f_j(X_j) + \eps$ by iteratively smoothing partial residuals.

For $p=2$ predictors with linear component smoothers $\hat f_j(X_j) = \hat\b_j X_j$:

1. Initialise: $\hat\a = \bar{y}$, $\hat f_1 = \hat f_2 = 0$.
2. Repeat until convergence (Gauss–Seidel order — re-centre each
$\hat f_j$ immediately, so the centred $\hat f_1$ is used when
computing $r_2$):
a. $r_1 \leftarrow y - \hat\a - \hat f_2(x_2)$; regress $r_1$ on
$x_1$ to update $\hat f_1$; $\hat f_1 \leftarrow \hat f_1 -
\overline{\hat f_1}$.
b. $r_2 \leftarrow y - \hat\a - \hat f_1(x_1)$; regress $r_2$ on
$x_2$ to update $\hat f_2$; $\hat f_2 \leftarrow \hat f_2 -
\overline{\hat f_2}$.

**(a)** Implement this on simulated data with known true coefficients.

**(b)** Verify the coefficients match `lm(y ~ x1 + x2)`.

::::


::::{#sol-backfitting}

::: {.panel-tabset}

#### R

```{r}
#| code-fold: true
#| eval: false
set.seed(1)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
y <- 2 + 3 * x1 - 1.5 * x2 + rnorm(n, sd = 0.5)
alpha_hat <- mean(y)
f1 <- f2 <- rep(0, n)
for (iter in seq_len(500)) {
r1 <- y - alpha_hat - f2
b1 <- coef(lm(r1 ~ x1))[2]
f1 <- b1 * x1
f1 <- f1 - mean(f1)
r2 <- y - alpha_hat - f1
b2 <- coef(lm(r2 ~ x2))[2]
f2 <- b2 * x2
f2 <- f2 - mean(f2)
}
lm_fit <- lm(y ~ x1 + x2)
cat(sprintf("Backfitting: b1=%.4f, b2=%.4f\n", b1, b2))
cat(sprintf("lm(): b1=%.4f, b2=%.4f\n",
coef(lm_fit)[2], coef(lm_fit)[3]))
```

#### Python

```{python}
#| eval: false
#| python.reticulate: false
import numpy as np
rng = np.random.default_rng(1)
n = 100; x1 = rng.standard_normal(n); x2 = rng.standard_normal(n)
y = 2 + 3*x1 - 1.5*x2 + rng.normal(0, 0.5, n)
alpha_hat = y.mean(); f1 = np.zeros(n); f2 = np.zeros(n)
for _ in range(500):
r1 = y - alpha_hat - f2
b1 = float(np.cov(x1, r1)[0, 1] / np.var(x1, ddof=1))
f1 = b1*x1 - (b1*x1).mean()
r2 = y - alpha_hat - f1
b2 = float(np.cov(x2, r2)[0, 1] / np.var(x2, ddof=1))
f2 = b2*x2 - (b2*x2).mean()
X_ols = np.column_stack([np.ones(n), x1, x2])
b_ols = np.linalg.lstsq(X_ols, y, rcond=None)[0]
print(f"Backfitting: b1={b1:.4f}, b2={b2:.4f}")
print(f"OLS: b1={b_ols[1]:.4f}, b2={b_ols[2]:.4f}")
```

:::

The coefficients agree, confirming backfitting converges to OLS when
all component smoothers are linear [@hastie2009esl2e, §9.1.1].

::::
76 changes: 76 additions & 0 deletions _subfiles/generalized-additive-models/_exr-gams-conceptual.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
::: notes
Exercises adapted from @james2021islr2e [§7.9] and @hastie2009esl2e [§9.1].
:::

### Piecewise polynomial continuity (ISLRv2 Conceptual 1) {#sec-exr-continuity}

::::{#exr-spline-continuity}

A cubic polynomial with one interior knot at $\xi$ uses the truncated power basis:

$$f(X) = \b_0 + \b_1 X + \b_2 X^2 + \b_3 X^3 + \b_4 (X-\xi)_+^3$$

where $(u)_+ = \max(u,0)$.

**(a)** Write $f(X)$ as two cubics: $f_1(X)$ for $X < \xi$ and $f_2(X)$ for $X \ge \xi$.

**(b)** Show $f_1(\xi) = f_2(\xi)$ (continuity).

**(c)** Show the first derivatives match at $\xi$.

**(d)** Show the second derivatives match at $\xi$.

**(e)** Do the third derivatives match? What order of smoothness does this cubic spline achieve?

::::


::::{#sol-spline-continuity}

**(a)** For $X < \xi$: $(X-\xi)_+ = 0$, giving $f_1(X) = \b_0 + \b_1 X + \b_2 X^2 + \b_3 X^3$.
For $X \ge \xi$: $f_2(X) = f_1(X) + \b_4(X-\xi)^3$.

**(b)** At $X=\xi$: $(X-\xi)_+^3 = 0$, so $f_2(\xi) = f_1(\xi)$. $\checkmark$

**(c)** $f_2^{\prime}(X) = f_1^{\prime}(X) + 3\b_4(X-\xi)_+^2$.
At $X=\xi$: $(X-\xi)_+^2 = 0$, so the derivatives agree. $\checkmark$

**(d)** $f_2^{\prime\prime}(X) = f_1^{\prime\prime}(X) + 6\b_4(X-\xi)_+$.
At $X=\xi$: $(X-\xi)_+ = 0$. $\checkmark$

**(e)** $f_1^{\prime\prime\prime} = 6\b_3$ and $f_2^{\prime\prime\prime} = 6\b_3 + 6\b_4$ for $X > \xi$.
These differ (unless $\b_4=0$). The truncated power basis gives a **cubic spline**:
continuous with continuous 1st and 2nd derivatives, but a jump in the 3rd derivative at each knot.

::::


### Smoothing spline penalty behavior (ISLRv2 Conceptual 2) {#sec-exr-smoothing-penalty}

::::{#exr-smoothing-penalty}

The smoothing spline minimizes

$$\hat{g} = \arg\min_g \left\{ \sum_{i=1}^n (y_i - g(x_i))^2 + \l \int g^{\prime\prime}(t)^2 \, dt \right\}.$$

**(a)** What is the limiting form of $\hat{g}$ as $\l \to \infty$?

**(b)** What is the limiting form as $\l \to 0$?

**(c)** For $\l \to 0$, is the training RSS equal to zero?

::::


::::{#sol-smoothing-penalty}

**(a)** As $\l \to \infty$ the roughness penalty dominates. The only $g$ with
$\int g^{\prime\prime}(t)^2\,dt = 0$ is a linear function $g(x) = a + bx$.
So $\hat{g}$ converges to the ordinary least-squares line.

**(b)** As $\l \to 0$ there is no penalty; the minimizer interpolates every
training point: $g(x_i) = y_i$ for all $i$.

**(c)** Yes. Interpolation means training RSS $= 0$ -- the overfit extreme.

::::
96 changes: 96 additions & 0 deletions _subfiles/generalized-additive-models/_sec_basis_functions.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
::: notes
Adapted from @james2021islr2e [§7.3] and @wood2017generalized [§4].
:::

Polynomial regression and step-function regression look very
different, but they share a common structure: both replace $X$ with
a fixed set of **transformations** of $X$, then fit a linear
regression in those transformations. This shared structure is the
**basis function** framework, and it's the foundation for splines.

::::{#def-basis-expansion}
#### Basis expansion

A **basis expansion** of $X$ is a fixed (data-free) set of functions
$b_1(X), \ldots, b_K(X)$. The basis-expansion regression model
replaces $X$ with these transformations:

$$y_i = \b_0 + \b_1 b_1(x_i) + \b_2 b_2(x_i) + \cdots + \b_K b_K(x_i) + \eps_i.$$

The basis functions $b_k(\cdot)$ are chosen ahead of time; the
coefficients $\b_k$ are estimated by least squares (or, more
generally, by maximum likelihood for a GLM with link
$g(\E{Y \mid X}) = \b_0 + \sum_k \b_k b_k(X)$).
::::

::::{#exm-basis-polynomial}
#### Polynomial regression as a basis expansion

For polynomial regression of degree $d$ (@def-polynomial-regression),
the basis functions are the monomials

$$b_k(X) = X^k, \quad k = 1, \ldots, d.$$
::::

::::{#exm-basis-step}
#### Step-function regression as a basis expansion

For step-function regression with cutpoints $c_1 < \cdots < c_K$
(@def-step-function-regression), the basis functions are the bin
indicators

$$b_k(X) = \indicp{c_{k-1} < X \le c_k}, \quad k = 1, \ldots, K+1.$$
::::

## Why the basis-function abstraction matters

Once we see polynomial and step-function regression as instances of
the same template, we can ask: *which other basis functions might be
worth trying?* Two answers turn out to be especially useful:

- **Truncated power basis functions** —
$b_k(X) = (X - \xi_k)_+^d$ for chosen knots $\xi_k$. These give the
**regression spline** family, which enforces piecewise-polynomial
structure with continuity and smoothness constraints at the knots
(developed in the next section).
- **B-spline basis functions** — a numerically well-conditioned
alternative basis that spans the same space of piecewise
polynomials as the truncated power basis, but with locally
supported basis functions (each $b_k$ is nonzero only on a small
region of the $x$-axis).

Both of these are still **linear** in the coefficients, so all of the
GLM machinery — likelihood, standard errors, Wald tests, AIC, etc. —
applies unchanged. The only thing that changes is the design matrix:
each row $i$ of the design matrix has entries
$1, b_1(x_i), \ldots, b_K(x_i)$ instead of $1, x_i$.

## The general estimation recipe

For a continuous response $Y$ and a basis expansion
$b_1(\cdot), \ldots, b_K(\cdot)$:

1. **Build the design matrix.** Construct the $n \times (K+1)$
matrix

$$\mat{B} = \sbmat{
1 & b_1(x_1) & \cdots & b_K(x_1) \\
1 & b_1(x_2) & \cdots & b_K(x_2) \\
\vdots & \vdots & \ddots & \vdots \\
1 & b_1(x_n) & \cdots & b_K(x_n)
}.$$

2. **Fit by least squares (or MLE for a GLM).**
The OLS estimator $\hat\b = (\tp{\mat{B}} \mat{B})^{-1} \tp{\mat{B}} \mat{y}$
minimizes the residual sum of squares — exactly the linear
regression machinery from earlier chapters, just with a different
design matrix.

3. **Plot the fitted curve.** For a dense grid of $x$ values, compute
$\hat y(x) = \hat\b_0 + \sum_k \hat\b_k b_k(x)$ and plot. Standard
errors and pointwise confidence bands come from the usual linear
model formulas applied to $\mat{B}$.

The remaining design decision is **which basis functions to use** —
and that choice is what distinguishes regression splines from
smoothing splines from local regression from GAMs.
Loading
Loading