Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ Suggests:
sjPlot,
equatiomatic,
broom (>= 1.0.8),
GGally,
regress3d,
lmtest,
gh,
lintr,
Expand All @@ -79,4 +81,5 @@ Remotes:
ddsjoberg/gtsummary,
insightsengineering/cardx,
insightsengineering/cards,
d-morrison/rmb
d-morrison/rmb,
d-morrison/regress3d@158d63cee6d823eccb5b0fc45e11f013ea0097d7
149 changes: 149 additions & 0 deletions _subfiles/Linear-models-overview/_sec_hers_data.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
:::{.callout-note}
This section is based on [@vittinghoff2e, Chapter 4].
:::

::: notes

{{< include _subfiles/shared/_sec_hers_intro.qmd >}}

:::

```{r}
#| eval: false
#| code-fold: show
library(haven)
hers <- haven::read_dta(
paste0(
"https://regression.ucsf.edu/sites/g/files",
"/tkssra6706/f/wysiwyg/home/data/hersdata.dta"
)
)
```

```{r}
#| include: false
library(haven)
hers <-
fs::path_package("rme", "extdata/hersdata.dta") |>
read_dta() |>
dplyr::mutate(
HT = as_factor(HT) |>
relevel(ref = "placebo"),
statins = as_factor(statins) |>
relevel(ref = "no")
)
```

::::: {.panel-tabset}

#### Data as table

:::{#tbl-hers-ch4}

```{r}
#| code-fold: true
hers |> head()
```

`hers` data

:::

#### Data as graph

:::{#fig-hers-scatter}

```{r}
#| code-fold: true
hers_scatter_data <- hers |>
dplyr::filter(!is.na(age), !is.na(BMI), !is.na(LDL), !is.na(statins))
plotly::plot_ly(
x = hers_scatter_data[["age"]],
y = hers_scatter_data[["BMI"]],
z = hers_scatter_data[["LDL"]],
color = as.character(hers_scatter_data[["statins"]]),
colors = c("no" = "steelblue", "yes" = "darkorange"),
type = "scatter3d",
mode = "markers",
marker = list(size = 3, opacity = 0.5)
) |>
plotly::layout(
scene = list(
xaxis = list(title = "Age (yr)"),
yaxis = list(title = "BMI (kg/m²)"),
zaxis = list(title = "LDL (mg/dL)")
),
legend = list(title = list(text = "Statins"))
)
```

`hers` data (@vittinghoff2e):
age (years) and BMI (kg/m²) vs. baseline LDL (mg/dL),
colored by statin use.

:::

#### Key variables

:::{#fig-hers-key-vars}

```{r}
#| code-fold: true
#| fig-height: 7
#| fig-width: 8
library(GGally)
hers |>
dplyr::select(LDL, HT, BMI, statins, age) |>
ggpairs(
mapping = aes(col = statins),
lower = list(continuous = GGally::wrap("points", alpha = 0.3)),
columnLabels = c(
"LDL (mg/dL)",
"HT",
"BMI (kg/m²)",
"Statins",
"Age (yr)"
)
) +
theme_bw() +
theme(legend.position = "bottom")
```

Key variables in `hers` data: outcome (LDL),
treatment (HT), and covariates (BMI, statins, age)

:::

:::::

{{< slidebreak >}}

#### Data notation {.smaller}

::: notes
Let's define some notation to represent this data:
:::

- $Y$: LDL cholesterol (mg/dL)
- $A$: age (years)
- $B$: BMI (kg/m²)
- $T$: hormone therapy treatment assignment
("placebo" or "hormone therapy")
- $H$: indicator variable for $T$ = "hormone therapy"
- $H = 0$ if $T$ = "placebo"
- $H = 1$ if $T$ = "hormone therapy"
- $U$: statin use ("no" or "yes")
- $V$: indicator variable for $U$ = "yes"
- $V = 0$ if $U$ = "no"
- $V = 1$ if $U$ = "yes"

::: notes
"Placebo" is the **reference level** for the categorical variable $T$,
and "no" is the **reference level** for statin use $U$.
The choice of reference level is arbitrary;
it only affects the interpretation of the intercept and corresponding indicator coefficients.
Comment thread
d-morrison marked this conversation as resolved.

Since LDL is measured at **baseline** (before the hormone therapy was administered),
$H$ is not included as a predictor in our regression models for LDL.
We instead focus on statin use $U$ (and its indicator $V$) as the key grouping variable.
:::
48 changes: 48 additions & 0 deletions _subfiles/Linear-models-overview/_sec_hers_lm_diagnostics.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
::: notes
We check whether the assumptions of the parallel-planes model (`hers_lm1`) hold
by examining residuals vs. fitted values (to assess linearity and constant variance)
and a QQ plot (to assess normality of residuals).
Plots are faceted by statin use to check for group-level residual patterns.
:::

#### Residuals vs fitted for `hers_lm1`

:::{#fig-hers-resid-fitted}

```{r}
#| code-fold: true
hers_diag <- hers |>
dplyr::mutate(
.fitted = fitted(hers_lm1),
.resid = residuals(hers_lm1)
)

ggplot(hers_diag, aes(x = .fitted, y = .resid)) +
geom_point(alpha = 0.3) +
geom_hline(yintercept = 0, linetype = "dashed") +
facet_wrap(~statins, labeller = label_both) +
xlab("Fitted values") +
ylab("Residuals") +
theme_bw()
```

Residuals vs fitted values for `hers_lm1` (parallel planes model)

:::

#### QQ plot for `hers_lm1`

:::{#fig-hers-qq}

```{r}
#| code-fold: true
ggplot(hers_diag, aes(sample = .resid)) +
stat_qq() +
stat_qq_line() +
facet_wrap(~statins, labeller = label_both) +
theme_bw()
```

QQ plot of residuals for `hers_lm1` (parallel planes model)

:::
45 changes: 45 additions & 0 deletions _subfiles/Linear-models-overview/_sec_hers_lm_diagnostics_lm2.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
::: notes
We repeat the same checks for the interaction model (`hers_lm2`).
:::

#### Residuals vs fitted for `hers_lm2`

:::{#fig-hers-resid-fitted-lm2}

```{r}
#| code-fold: true
hers_diag2 <- hers |>
dplyr::mutate(
.fitted = fitted(hers_lm2),
.resid = residuals(hers_lm2)
)

ggplot(hers_diag2, aes(x = .fitted, y = .resid)) +
geom_point(alpha = 0.3) +
geom_hline(yintercept = 0, linetype = "dashed") +
facet_wrap(~statins, labeller = label_both) +
xlab("Fitted values") +
ylab("Residuals") +
theme_bw()
```

Residuals vs fitted values for `hers_lm2` (interaction model)

:::

#### QQ plot for `hers_lm2`

:::{#fig-hers-qq-lm2}

```{r}
#| code-fold: true
ggplot(hers_diag2, aes(sample = .resid)) +
stat_qq() +
stat_qq_line() +
facet_wrap(~statins, labeller = label_both) +
theme_bw()
```

QQ plot of residuals for `hers_lm2` (interaction model)

:::
54 changes: 54 additions & 0 deletions _subfiles/Linear-models-overview/_sec_hers_lm_gof.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
::: notes
We compare the parallel-planes model (`hers_lm1`: `LDL ~ age + BMI`)
and the interaction model (`hers_lm2`: `LDL ~ age + BMI + age:BMI`)
using AIC, BIC, and deviance.
Lower AIC and BIC values indicate a better trade-off between fit and complexity;
lower deviance indicates a better-fitting model (at the cost of more parameters).
:::

#### AIC and BIC for `hers` models

::: {#tbl-hers-aic-bic}

```{r}
#| code-fold: true
aic <- AIC(hers_lm1, hers_lm2)
bic <- BIC(hers_lm1, hers_lm2)
data.frame(
df = aic$df,
AIC = aic$AIC,
BIC = bic$BIC,
row.names = rownames(aic)
) |>
knitr::kable()
```

AIC and BIC for the `hers` parallel-planes (`hers_lm1`) and interaction (`hers_lm2`) models

:::

The model with the lower AIC or BIC value is preferred as the better balance of fit and parsimony.
A smaller value for the interaction model indicates the added age-BMI interaction term is worthwhile;
a smaller value for the parallel-planes model would favor the simpler specification.


#### Deviance for `hers` models

::: {#tbl-hers-deviance}

```{r}
#| code-fold: true
data.frame(
deviance = c(deviance(hers_lm1), deviance(hers_lm2)),
row.names = c("hers_lm1", "hers_lm2")
) |>
knitr::kable()
```

Deviance for the `hers` parallel-planes (`hers_lm1`) and interaction (`hers_lm2`) models

:::

Lower deviance indicates a closer fit to the data; the interaction model uses one more parameter.
The likelihood ratio test in @tbl-hers-lrt provides a formal comparison of whether the improvement justifies the added complexity.

Loading
Loading