d-morrison · Copilot · Jun 9, 2026 · Jun 9, 2026 · Jun 9, 2026 · Jun 19, 2026
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -62,6 +62,8 @@ Suggests:
     sjPlot,
     equatiomatic,
     broom (>= 1.0.8),
+    GGally,
+    regress3d,
     lmtest,
     gh,
     lintr,
@@ -79,4 +81,5 @@ Remotes:
     ddsjoberg/gtsummary,
     insightsengineering/cardx,
     insightsengineering/cards,
-    d-morrison/rmb
+    d-morrison/rmb,
+    d-morrison/regress3d@158d63cee6d823eccb5b0fc45e11f013ea0097d7
diff --git a/_subfiles/Linear-models-overview/_sec_hers_data.qmd b/_subfiles/Linear-models-overview/_sec_hers_data.qmd
@@ -0,0 +1,149 @@
+:::{.callout-note}
+This section is based on [@vittinghoff2e, Chapter 4].
+:::
+
+::: notes
+
+{{< include _subfiles/shared/_sec_hers_intro.qmd >}}
+
+:::
+
+```{r}
+#| eval: false
+#| code-fold: show
+library(haven)
+hers <- haven::read_dta(
+  paste0(
+    "https://regression.ucsf.edu/sites/g/files",
+    "/tkssra6706/f/wysiwyg/home/data/hersdata.dta"
+  )
+)
+```
+
+```{r}
+#| include: false
+library(haven)
+hers <-
+  fs::path_package("rme", "extdata/hersdata.dta") |>
+  read_dta() |>
+  dplyr::mutate(
+    HT = as_factor(HT) |>
+      relevel(ref = "placebo"),
+    statins = as_factor(statins) |>
+      relevel(ref = "no")
+  )
+```
+
+::::: {.panel-tabset}
+
+#### Data as table
+
+:::{#tbl-hers-ch4}
+
+```{r}
+#| code-fold: true
+hers |> head()
+```
+
+`hers` data
+
+:::
+
+#### Data as graph
+
+:::{#fig-hers-scatter}
+
+```{r}
+#| code-fold: true
+hers_scatter_data <- hers |>
+  dplyr::filter(!is.na(age), !is.na(BMI), !is.na(LDL), !is.na(statins))
+plotly::plot_ly(
+  x = hers_scatter_data[["age"]],
+  y = hers_scatter_data[["BMI"]],
+  z = hers_scatter_data[["LDL"]],
+  color = as.character(hers_scatter_data[["statins"]]),
+  colors = c("no" = "steelblue", "yes" = "darkorange"),
+  type = "scatter3d",
+  mode = "markers",
+  marker = list(size = 3, opacity = 0.5)
+) |>
+  plotly::layout(
+    scene = list(
+      xaxis = list(title = "Age (yr)"),
+      yaxis = list(title = "BMI (kg/m²)"),
+      zaxis = list(title = "LDL (mg/dL)")
+    ),
+    legend = list(title = list(text = "Statins"))
+  )
+```
+
+`hers` data (@vittinghoff2e):
+age (years) and BMI (kg/m²) vs. baseline LDL (mg/dL),
+colored by statin use.
+
+:::
+
+#### Key variables
+
+:::{#fig-hers-key-vars}
+
+```{r}
+#| code-fold: true
+#| fig-height: 7
+#| fig-width: 8
+library(GGally)
+hers |>
+  dplyr::select(LDL, HT, BMI, statins, age) |>
+  ggpairs(
+    mapping = aes(col = statins),
+    lower = list(continuous = GGally::wrap("points", alpha = 0.3)),
+    columnLabels = c(
+      "LDL (mg/dL)",
+      "HT",
+      "BMI (kg/m²)",
+      "Statins",
+      "Age (yr)"
+    )
+  ) +
+  theme_bw() +
+  theme(legend.position = "bottom")
+```
+
+Key variables in `hers` data: outcome (LDL),
+treatment (HT), and covariates (BMI, statins, age)
+
+:::
+
+:::::
+
+{{< slidebreak >}}
+
+#### Data notation {.smaller}
+
+::: notes
+Let's define some notation to represent this data:
+:::
+
+-   $Y$: LDL cholesterol (mg/dL)
+-   $A$: age (years)
+-   $B$: BMI (kg/m²)
+-   $T$: hormone therapy treatment assignment
+    ("placebo" or "hormone therapy")
+-   $H$: indicator variable for $T$ = "hormone therapy"
+    -   $H = 0$ if $T$ = "placebo"
+    -   $H = 1$ if $T$ = "hormone therapy"
+-   $U$: statin use ("no" or "yes")
+-   $V$: indicator variable for $U$ = "yes"
+    -   $V = 0$ if $U$ = "no"
+    -   $V = 1$ if $U$ = "yes"
+
+::: notes
+"Placebo" is the **reference level** for the categorical variable $T$,
+and "no" is the **reference level** for statin use $U$.
+The choice of reference level is arbitrary;
+it only affects the interpretation of the intercept and corresponding indicator coefficients.
+
+Since LDL is measured at **baseline** (before the hormone therapy was administered),
+$H$ is not included as a predictor in our regression models for LDL.
+We instead focus on statin use $U$ (and its indicator $V$) as the key grouping variable.
+:::
diff --git a/_subfiles/Linear-models-overview/_sec_hers_lm_diagnostics.qmd b/_subfiles/Linear-models-overview/_sec_hers_lm_diagnostics.qmd
@@ -0,0 +1,48 @@
+::: notes
+We check whether the assumptions of the parallel-planes model (`hers_lm1`) hold
+by examining residuals vs. fitted values (to assess linearity and constant variance)
+and a QQ plot (to assess normality of residuals).
+Plots are faceted by statin use to check for group-level residual patterns.
+:::
+
+#### Residuals vs fitted for `hers_lm1`
+
+:::{#fig-hers-resid-fitted}
+
+```{r}
+#| code-fold: true
+hers_diag <- hers |>
+  dplyr::mutate(
+    .fitted = fitted(hers_lm1),
+    .resid = residuals(hers_lm1)
+  )
+
+ggplot(hers_diag, aes(x = .fitted, y = .resid)) +
+  geom_point(alpha = 0.3) +
+  geom_hline(yintercept = 0, linetype = "dashed") +
+  facet_wrap(~statins, labeller = label_both) +
+  xlab("Fitted values") +
+  ylab("Residuals") +
+  theme_bw()
+```
+
+Residuals vs fitted values for `hers_lm1` (parallel planes model)
+
+:::
+
+#### QQ plot for `hers_lm1`
+
+:::{#fig-hers-qq}
+
+```{r}
+#| code-fold: true
+ggplot(hers_diag, aes(sample = .resid)) +
+  stat_qq() +
+  stat_qq_line() +
+  facet_wrap(~statins, labeller = label_both) +
+  theme_bw()
+```
+
+QQ plot of residuals for `hers_lm1` (parallel planes model)
+
+:::
diff --git a/_subfiles/Linear-models-overview/_sec_hers_lm_diagnostics_lm2.qmd b/_subfiles/Linear-models-overview/_sec_hers_lm_diagnostics_lm2.qmd
@@ -0,0 +1,45 @@
+::: notes
+We repeat the same checks for the interaction model (`hers_lm2`).
+:::
+
+#### Residuals vs fitted for `hers_lm2`
+
+:::{#fig-hers-resid-fitted-lm2}
+
+```{r}
+#| code-fold: true
+hers_diag2 <- hers |>
+  dplyr::mutate(
+    .fitted = fitted(hers_lm2),
+    .resid = residuals(hers_lm2)
+  )
+
+ggplot(hers_diag2, aes(x = .fitted, y = .resid)) +
+  geom_point(alpha = 0.3) +
+  geom_hline(yintercept = 0, linetype = "dashed") +
+  facet_wrap(~statins, labeller = label_both) +
+  xlab("Fitted values") +
+  ylab("Residuals") +
+  theme_bw()
+```
+
+Residuals vs fitted values for `hers_lm2` (interaction model)
+
+:::
+
+#### QQ plot for `hers_lm2`
+
+:::{#fig-hers-qq-lm2}
+
+```{r}
+#| code-fold: true
+ggplot(hers_diag2, aes(sample = .resid)) +
+  stat_qq() +
+  stat_qq_line() +
+  facet_wrap(~statins, labeller = label_both) +
+  theme_bw()
+```
+
+QQ plot of residuals for `hers_lm2` (interaction model)
+
+:::
diff --git a/_subfiles/Linear-models-overview/_sec_hers_lm_gof.qmd b/_subfiles/Linear-models-overview/_sec_hers_lm_gof.qmd
@@ -0,0 +1,54 @@
+::: notes
+We compare the parallel-planes model (`hers_lm1`: `LDL ~ age + BMI`)
+and the interaction model (`hers_lm2`: `LDL ~ age + BMI + age:BMI`)
+using AIC, BIC, and deviance.
+Lower AIC and BIC values indicate a better trade-off between fit and complexity;
+lower deviance indicates a better-fitting model (at the cost of more parameters).
+:::
+
+#### AIC and BIC for `hers` models
+
+::: {#tbl-hers-aic-bic}
+
+```{r}
+#| code-fold: true
+aic <- AIC(hers_lm1, hers_lm2)
+bic <- BIC(hers_lm1, hers_lm2)
+data.frame(
+  df  = aic$df,
+  AIC = aic$AIC,
+  BIC = bic$BIC,
+  row.names = rownames(aic)
+) |>
+  knitr::kable()
+```
+
+AIC and BIC for the `hers` parallel-planes (`hers_lm1`) and interaction (`hers_lm2`) models
+
+:::
+
+The model with the lower AIC or BIC value is preferred as the better balance of fit and parsimony.
+A smaller value for the interaction model indicates the added age-BMI interaction term is worthwhile;
+a smaller value for the parallel-planes model would favor the simpler specification.
+
+
+#### Deviance for `hers` models
+
+::: {#tbl-hers-deviance}
+
+```{r}
+#| code-fold: true
+data.frame(
+  deviance = c(deviance(hers_lm1), deviance(hers_lm2)),
+  row.names = c("hers_lm1", "hers_lm2")
+) |>
+  knitr::kable()
+```
+
+Deviance for the `hers` parallel-planes (`hers_lm1`) and interaction (`hers_lm2`) models
+
+:::
+
+Lower deviance indicates a closer fit to the data; the interaction model uses one more parameter.
+The likelihood ratio test in @tbl-hers-lrt provides a formal comparison of whether the improvement justifies the added complexity.
+