d-morrison · Claude · Jun 9, 2026 · Jun 9, 2026 · Jun 9, 2026 · Jun 18, 2026
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -266,6 +266,21 @@ Do not use generic acknowledgements without locators
 or plaintext author-title references
 when a BibTeX citation is available.
 
+## Observational vs Causal Estimands
+
+Always distinguish observational estimands
+from causal estimands in notation and prose.
+
+- Use observational notation
+  (for example, standardized risks based on `\E{Y \mid A=a, Z=z}`)
+  when discussing model-based associations.
+- Use potential-outcome notation
+  (for example, `\Pr(Y^a = 1)`)
+  only when making a causal claim.
+- If observational and causal estimands are equated,
+  explicitly state identification assumptions
+  (consistency, exchangeability, and positivity).
+
 ## Variable Definitions in Exercises
 
 When introducing model variables in exercises,
@@ -383,6 +398,113 @@ When introducing or editing formal statistical definitions in `.qmd` files:
   ensure those terms also have formal `#def-` div definitions
   in the relevant scope before relying on them
 
+## Slidebreaks Before Theorem-Type Divs
+
+Always add `{{< slidebreak >}}` on a blank line immediately before
+every theorem-type div opener.
+This ensures slide-format output stays readable.
+
+Theorem-type div types (per [Quarto cross-reference docs](https://quarto.org/docs/authoring/cross-references.html#theorems-and-proofs)):
+`#thm-`, `#lem-`, `#cor-`, `#prp-`, `#cnj-`, `#def-`, `#exm-`, `#exr-`, `#rem-`
+
+### Slidebreaks in including vs. included files
+
+When a subfile's **first** content is a theorem-type div,
+place the `{{< slidebreak >}}` in the **including** (parent) file,
+immediately before the `{{< include >}}` shortcode —
+**not** at the start of the included subfile itself.
+
+**Correct** — slidebreak in the including file:
+```qmd
+<!-- in the parent file -->
+{{< slidebreak >}}
+
+{{< include _subfiles/chapter/_exm-my-example.qmd >}}
+```
+
+```qmd
+<!-- in _exm-my-example.qmd — no leading slidebreak -->
+:::{#exm-my-example}
+
+#### My example title
+
+Content...
+
+:::
+```
+
+**Incorrect** — slidebreak inside the included file:
+```qmd
+<!-- in the parent file — no slidebreak -->
+{{< include _subfiles/chapter/_exm-my-example.qmd >}}
+```
+
+```qmd
+<!-- in _exm-my-example.qmd — do NOT put slidebreak here -->
+{{< slidebreak >}}
+
+:::{#exm-my-example}
+
+#### My example title
+
+Content...
+
+:::
+```
+
+**Correct** example (non-leading slidebreak, same file):
+```qmd
+{{< slidebreak >}}
+
+:::{#def-collapsibility}
+
+#### Collapsibility
+
+A measure is *collapsible* if ...
+
+:::
+```
+
+**Incorrect** (missing slidebreak):
+```qmd
+:::{#def-collapsibility}
+
+#### Collapsibility
+
+A measure is *collapsible* if ...
+
+:::
+```
+
+## Example Formatting
+
+All worked examples in `.qmd` files must be wrapped in a Quarto `#exm-` div.
+Never leave a named example as a plain markdown section.
+
+**Correct:**
+```qmd
+:::{#exm-wcgs-marginal-rd}
+
+##### Example: Marginal risk difference
+
+Content of the example...
+
+:::
+```
+
+**Incorrect:**
+```qmd
+##### Example: Marginal risk difference
+
+Content of the example...
+```
+
+- Use an id beginning `#exm-` (for example, `#exm-wcgs-marginal-rd`)
+- Put the example title in a heading inside the div,
+  at the heading level matching the surrounding section depth
+- All content for the example (setup, computation, interpretation)
+  should live inside the div
+
 ## Div Titles vs. Markdown Headings
 
 **CRITICAL**: Div titles (headings inside divs like `:::{#def-...}`, `:::{#thm-...}`, `:::{#exm-...}`, etc.) are NOT the same as regular markdown headings.
@@ -822,6 +944,7 @@ Content here.
 
 More content.
 ```
+When a subfile begins with a theorem-type div (`#thm-`, `#lem-`, `#cor-`, `#prp-`, `#cnj-`, `#def-`, `#exm-`, `#exr-`, `#rem-`), place the preceding `{{< slidebreak >}}` in the **parent** file (before the `{{< include >}}`), not inside the subfile. The subfile itself should not start with `{{< slidebreak >}}`.
 
 ## Computer Algebra Systems (CAS)
 

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -41,6 +41,8 @@ Before committing any `.qmd`, `.R`, or config file change:
 
 ### Quarto
 - Use `{{< slidebreak >}}` instead of `---` for slide breaks
+- Add `{{< slidebreak >}}` immediately before every theorem-type div (`#thm-`, `#lem-`, `#cor-`, `#prp-`, `#cnj-`, `#def-`, `#exm-`, `#exr-`, `#rem-`)
+- When a subfile begins with a theorem-type div, put the preceding `{{< slidebreak >}}` in the **parent** file (before the `{{< include >}}`), not inside the subfile
 - Default to `#| code-fold: true` for figure/table chunks
 - Use div format (`:::{#fig-...}`) for figures and tables, not chunk-option `fig-cap`/`tbl-cap`
 - Do not indent `:::` fenced div markers inside lists

diff --git a/_subfiles/logistic-regression/_exm-collapsibility.qmd b/_subfiles/logistic-regression/_exm-collapsibility.qmd
@@ -0,0 +1,68 @@
+:::{#exm-collapsibility}
+
+#### Collapsibility: numerical illustration
+
+Consider a hypothetical scenario with two strata
+($Z = 0$: low-risk, $Z = 1$: high-risk).
+We make two simplifying assumptions:
+
+1. **No confounding** ($A \perp Z$):
+   exposure is independent of the covariate,
+   so $\p(Z = z \mid A = a) = \p(Z = z)$ for all $a$ and $z$.
+2. **Equal stratum sizes** ($\p(Z = 0) = \p(Z = 1) = \tfrac{1}{2}$):
+   both strata have the same marginal probability.
+
+Together, these imply
+$\p(Z = z \mid A = a) = \tfrac{1}{2}$ for all $a$ and $z$,
+so the simple (equal-weight) average of stratum-specific effects
+equals both the observed marginal risk $\pi(a)$ and the causal marginal risk $\pi_a$:
+
+```{r}
+#| label: collapsibility-example
+#| code-fold: true
+
+strata <- tibble::tibble(
+  stratum = c("Z = 0 (low risk)", "Z = 1 (high risk)"),
+  pi_0    = c(0.05, 0.30),
+  pi_1    = c(0.10, 0.50)
+) |>
+  dplyr::mutate(
+    RD = pi_1 - pi_0,
+    RR = pi_1 / pi_0,
+    OR = (pi_1 / (1 - pi_1)) / (pi_0 / (1 - pi_0))
+  )
+
+pi1_marg <- mean(strata$pi_1)
+pi0_marg <- mean(strata$pi_0)
+
+tibble::tibble(
+  Measure = c("Risk difference", "Risk ratio", "Odds ratio"),
+  Marginal = c(
+    pi1_marg - pi0_marg,
+    pi1_marg / pi0_marg,
+    (pi1_marg / (1 - pi1_marg)) / (pi0_marg / (1 - pi0_marg))
+  ),
+  `Avg. conditional` = c(
+    mean(strata$RD), mean(strata$RR), mean(strata$OR)
+  ),
+  `Marginal = avg. conditional?` = c("Yes", "No", "No")
+) |>
+  knitr::kable(digits = 3)
+```
+
+Even with no confounding, the marginal RR and marginal OR both differ from
+the average of their conditional counterparts,
+while the marginal RD equals the average conditional RD exactly.
+
+Non-collapsibility is distinct from effect-measure modification.
+Here the stratum-specific effects also vary across strata
+(risk ratios $2.00$ vs. $1.67$, odds ratios $2.11$ vs. $2.33$),
+but that variation is not what drives the discrepancy:
+even if the conditional odds ratio were held *constant* across strata,
+the marginal odds ratio would generally still differ from it
+(and lie closer to the null),
+whereas a constant conditional risk difference
+always reproduces the marginal risk difference.
+
+:::
+
diff --git a/_subfiles/logistic-regression/_sec-OR-alternatives.qmd b/_subfiles/logistic-regression/_sec-OR-alternatives.qmd
@@ -1,16 +1,16 @@
 
-### Objections to odds ratios
+## Objections to odds ratios
 
 {{< include _subfiles/logistic-regression/_sec_OR_objections.qmd >}}
 
-### Deriving risk ratios and risk differences from logistic regression models
+## Deriving risk ratios and risk differences from logistic regression models
 
 {{< include _subfiles/logistic-regression/_sec-logistic-RR-RD.qmd >}}
 
-### Other link functions for Bernoulli outcomes
+## Other link functions for Bernoulli outcomes
 
 {{< include _subfiles/logistic-regression/_sec-non-logistic-bernoulli-models.qmd >}}
 
-### Quasibinomial
+## Quasibinomial
 
 See [Hua Zhou](https://hua-zhou.github.io/)'s [lecture notes](https://ucla-biostat-200c-2020spring.github.io/slides/04-binomial/binomial.html#:~:text=0.05%20%27.%27%200.1%20%27%20%27%201-,Quasi%2Dbinomial,-Another%20way%20to)
diff --git a/_subfiles/logistic-regression/_sec-bootstrap-boot-package.qmd b/_subfiles/logistic-regression/_sec-bootstrap-boot-package.qmd
@@ -0,0 +1,51 @@
+The [`boot`](https://cran.r-project.org/package=boot) package provides a more streamlined interface for bootstrap inference.
+Here's how to compute the same confidence interval using `boot::boot()`:
+
+```{r}
+#| label: boot-package-example
+#| code-fold: show
+#| eval: false
+
+library(boot)
+
+statistic_fn <- function(data, indices) {
+  boot_data <- data[indices, ]
+
+  boot_fit <- glm(
+    chd69_binary ~ dibpat + age,
+    data = boot_data,
+    family = binomial(link = "logit")
+  )
+
+  compute_marginal_rd(
+    model = boot_fit,
+    data = boot_data,
+    exposure_var = "dibpat",
+    exposed_level = "Type A",
+    unexposed_level = "Type B"
+  )
+}
+
+set.seed(20260512)
+boot_results <- boot(
+  data = wcgs_clean,
+  statistic = statistic_fn,
+  # Keep the rendered example fast; increase to 2000+ for final analyses.
+  R = 300
+)
+
+boot_results
+
+boot.ci(boot_results, type = c("perc", "bca"))
+```
+
+The [`boot`](https://cran.r-project.org/package=boot) package provides several types of confidence intervals,
+including the percentile method (`perc`)
+and the bias-corrected and accelerated (BCa) method (`bca`),
+which can provide better coverage in some situations.
+The BCa method is more demanding than the percentile method:
+its bias-correction and acceleration estimates are unstable
+at the illustrative `R = 300` used here
+(and can warn or fail outright),
+so use a substantially larger `R` (at least 2000)
+before relying on `bca` intervals.
diff --git a/_subfiles/logistic-regression/_sec-bootstrap-inference.qmd b/_subfiles/logistic-regression/_sec-bootstrap-inference.qmd
@@ -0,0 +1,42 @@
+Adapted from
+[@vittinghoff2e, Section 3.6, p. 62],
+[@hastie2009esl2e, Section 7.11, p. 249],
+and
+[@james2021islr2e, Chapter 5, Section 5.2, p. 209].
+
+The bootstrap is a resampling method
+that allows us to estimate the sampling distribution
+of a statistic without making strong parametric assumptions.
+For an introduction to the bootstrap,
+see [Bootstrap Confidence Intervals](basic-statistical-methods.qmd#sec-bootstrap-ci).
+
+#### Bootstrap algorithm
+
+To construct a bootstrap confidence interval
+for a marginal risk difference:
+
+1. For $b = 1, \ldots, B$ (e.g., $B = 1000$):
+   a. Draw a bootstrap sample of size $n$ with replacement from the original data
+   b. Fit the logistic regression model to the bootstrap sample
+   c. Compute the marginal risk difference from the fitted model
+2. The bootstrap distribution of the $B$ risk difference estimates
+   approximates the sampling distribution
+3. Construct a confidence interval using the percentile method
+   (e.g., the 2.5th and 97.5th percentiles for a 95% CI)
+
+The bootstrap standard error is the standard deviation
+of the bootstrap distribution.
+
+{{< slidebreak >}}
+
+:::{#exm-wcgs-marginal-rd}
+
+#### Example: CHD risk and behavioral pattern
+
+{{< include _subfiles/logistic-regression/_sec-wcgs-bootstrap-example.qmd >}}
+
+:::
+
+#### Alternative: Using the boot package
+
+{{< include _subfiles/logistic-regression/_sec-bootstrap-boot-package.qmd >}}