diff --git a/_subfiles/Linear-models-overview/_sec_linreg_diag_residuals.qmd b/_subfiles/Linear-models-overview/_sec_linreg_diag_residuals.qmd
index 4d7189c25..edbc9c6aa 100644
--- a/_subfiles/Linear-models-overview/_sec_linreg_diag_residuals.qmd
+++ b/_subfiles/Linear-models-overview/_sec_linreg_diag_residuals.qmd
@@ -123,6 +123,8 @@ $$\hat{\vY} = H\vY$$
 :::
 
 :::{#thm-resid-unbiased}
+#### Mean and variance of residuals
+
 For an ordinary least squares linear model
 with fitted values $\hat y_i = \dprodf{\vx_i}{\vb}$
 (and fitted-value vector $\hat{\vY}$),
diff --git a/_subfiles/intro-MLEs/_sec-loglik.qmd b/_subfiles/intro-MLEs/_sec-loglik.qmd
index 0af0660bb..d62659578 100644
--- a/_subfiles/intro-MLEs/_sec-loglik.qmd
+++ b/_subfiles/intro-MLEs/_sec-loglik.qmd
@@ -8,6 +8,7 @@ It is typically easier to work with the log of the likelihood function:
 ---
 
 :::{#thm-mle-use-log}
+#### Maximize the log-likelihood instead of the likelihood
 
 The likelihood and log-likelihood have the same maximizer:
 
diff --git a/_subfiles/intro-MLEs/_sec_likelihood.qmd b/_subfiles/intro-MLEs/_sec_likelihood.qmd
index 33e8b2892..32e6d07dc 100644
--- a/_subfiles/intro-MLEs/_sec_likelihood.qmd
+++ b/_subfiles/intro-MLEs/_sec_likelihood.qmd
@@ -103,6 +103,7 @@ $$\Lik_i(\theta) = \P(X_i=x_i)$$
 ---
 
 :::{#thm-ds-lik-obs-lik}
+#### Dataset likelihood as a product of observation likelihoods
 
 For $\iid$ data $\vx \eqdef \x1n$,
 the likelihood of the dataset is equal to the product of the observation-specific likelihood factors:
diff --git a/_subfiles/intro-to-survival-analysis/_sec-cuhaz.qmd b/_subfiles/intro-to-survival-analysis/_sec-cuhaz.qmd
index 509ae57e1..850cd405c 100644
--- a/_subfiles/intro-to-survival-analysis/_sec-cuhaz.qmd
+++ b/_subfiles/intro-to-survival-analysis/_sec-cuhaz.qmd
@@ -2,6 +2,8 @@
 Since $\haz(t) = \deriv{t}\cb{-\log{\surv(t)}}$ (see @thm-h-logS), we also have:
 
 :::{#cor-surv-int-haz}
+#### Survival function from the cumulative hazard
+
 $$\surv(t) = \exp{-\int_{u=0}^t \haz(u)du}$${#eq-surv-int-haz}
 :::
 
diff --git a/_subfiles/intro-to-survival-analysis/_sec-exp-dist.qmd b/_subfiles/intro-to-survival-analysis/_sec-exp-dist.qmd
index af2705234..5432ba908 100644
--- a/_subfiles/intro-to-survival-analysis/_sec-exp-dist.qmd
+++ b/_subfiles/intro-to-survival-analysis/_sec-exp-dist.qmd
@@ -75,6 +75,8 @@ $$
 ---
 
 :::{#thm-mle-exp}
+#### MLE of the exponential rate parameter
+
 Let $T=\sum t_i$ and $U=\sum u_j$. Then:
 
 $$
diff --git a/_subfiles/intro-to-survival-analysis/_sec-inv-survf.qmd b/_subfiles/intro-to-survival-analysis/_sec-inv-survf.qmd
index c5e24b6da..4f7030bc7 100644
--- a/_subfiles/intro-to-survival-analysis/_sec-inv-survf.qmd
+++ b/_subfiles/intro-to-survival-analysis/_sec-inv-survf.qmd
@@ -167,6 +167,7 @@ qexp(p = 0.5, rate = 2)
 {{< slidebreak >}}
 
 :::{#thm-inv-surv-is-quantile}
+#### Inverse survival function is the quantile function
 
 The inverse survival function equals the $(1-p)$th
 [population quantile](probability.qmd#def-quantile-function)
diff --git a/_subfiles/intro-to-survival-analysis/_sec-survf.qmd b/_subfiles/intro-to-survival-analysis/_sec-survf.qmd
index 0f19ed4f4..e934f8b71 100644
--- a/_subfiles/intro-to-survival-analysis/_sec-survf.qmd
+++ b/_subfiles/intro-to-survival-analysis/_sec-survf.qmd
@@ -27,6 +27,7 @@ $$\surv(t) \eqdef \Pr(T > t)$$
 ---
 
 :::{#thm-survival-expressions-1}
+#### Equivalent expressions for the survival function
 
 $$
 \begin{aligned}
@@ -109,6 +110,7 @@ ggplot() +
 ---
 
 :::{#thm-surv-fn-as-mean-status}
+#### Survival function as expected survival status
 
 If $A_t$ represents survival status at time $t$, with $A_t = 1$ denoting alive at time $t$ and $A_t = 0$ denoting deceased at time $t$, then:
 
@@ -119,6 +121,7 @@ $$\surv(t) = \P(A_t=1) = \E{A_t}$$
 ---
 
 :::{#thm-surv-and-mean}
+#### Mean as the integral of the survival function
 
 If $T$ is a nonnegative random variable, then:
 
diff --git a/_subfiles/logistic-regression/_sec-d_odds-d_logodds.qmd b/_subfiles/logistic-regression/_sec-d_odds-d_logodds.qmd
index 0ab133c5e..45198741a 100644
--- a/_subfiles/logistic-regression/_sec-d_odds-d_logodds.qmd
+++ b/_subfiles/logistic-regression/_sec-d_odds-d_logodds.qmd
@@ -1,4 +1,6 @@
 :::{#lem-deriv-invodds}
+#### Derivative of odds w.r.t. log-odds
+
 $$\derivf{\odds}{\logodds} = \odds$$
 
 :::
@@ -26,6 +28,8 @@ $$
 
 :::{#thm-d_odds-d_logodds}
 
+#### Derivative of odds in terms of probability
+
 $$\derivf{\omega}{\eta} = \frac{\pi}{1-\pi}$${#eq-d_omega-d_eta}
 
 :::
diff --git a/_subfiles/logistic-regression/_sec_OR-ratio-ratio.qmd b/_subfiles/logistic-regression/_sec_OR-ratio-ratio.qmd
index 4293b445d..1ae597ede 100644
--- a/_subfiles/logistic-regression/_sec_OR-ratio-ratio.qmd
+++ b/_subfiles/logistic-regression/_sec_OR-ratio-ratio.qmd
@@ -5,6 +5,8 @@ so odds ratios are ratios of ratios:
 :::
 
 :::{#thm-or-ratio-ratio}
+#### Odds ratio as a ratio of ratios
+
 $$
 \ba
 \ratio(\odds_1, \odds_2)
diff --git a/_subfiles/logistic-regression/_sec_OR_logistic.qmd b/_subfiles/logistic-regression/_sec_OR_logistic.qmd
index 24dd4f313..30b077988 100644
--- a/_subfiles/logistic-regression/_sec_OR_logistic.qmd
+++ b/_subfiles/logistic-regression/_sec_OR_logistic.qmd
@@ -197,6 +197,8 @@ $$
 
 :::{#thm-logistic-OR}
 
+#### Odds ratio from difference in covariate patterns
+
 The odds ratio comparing covariate patterns $\vx$ and $\vxs$ is:
 
 {{< include _subfiles/logistic-regression/_eq_OR_delta.qmd >}}
@@ -211,6 +213,8 @@ By @sol-simplify-logistic-OR.
 
 :::{#cor-log-or}
 
+#### Log odds ratio equals the difference in log-odds
+
 $$\logf {\ror(\vx,\vxs)} = \difflogodds$$
 
 :::
diff --git a/_subfiles/logistic-regression/_sec_d-pi_d-eta.qmd b/_subfiles/logistic-regression/_sec_d-pi_d-eta.qmd
index 580457d07..1e705d18d 100644
--- a/_subfiles/logistic-regression/_sec_d-pi_d-eta.qmd
+++ b/_subfiles/logistic-regression/_sec_d-pi_d-eta.qmd
@@ -1,5 +1,7 @@
 :::{#thm-d_prob-d_logodds}
 
+#### Derivative of probability w.r.t. log-odds
+
 $$\derivf{\prob}{\logodds} = \pi (1-\pi)$$
 :::
 
@@ -39,6 +41,8 @@ $$
 
 :::{#cor-d_pi-d_eta-var}
 
+#### Derivative of probability w.r.t. linear predictor as a variance
+
 If $\pi = \Pr(Y=1| \vX=\vx)$, then:
 
 $$\derivf{\pi}{\eta} = \Varf{Y|X=x}$$
diff --git a/_subfiles/logistic-regression/_sec_derive_logistic_loglik.qmd b/_subfiles/logistic-regression/_sec_derive_logistic_loglik.qmd
index 201c96830..17f594da4 100644
--- a/_subfiles/logistic-regression/_sec_derive_logistic_loglik.qmd
+++ b/_subfiles/logistic-regression/_sec_derive_logistic_loglik.qmd
@@ -41,6 +41,8 @@ $$
 
 :::{#lem-logistic-loglik-component}
 
+#### Per-observation log-likelihood component
+
 $$\ell_i(\pi_i) = y_i \eta_i - \logf{1+\odds_i}$$
 
 :::
diff --git a/_subfiles/logistic-regression/_sec_expit.qmd b/_subfiles/logistic-regression/_sec_expit.qmd
index 333865654..467841016 100644
--- a/_subfiles/logistic-regression/_sec_expit.qmd
+++ b/_subfiles/logistic-regression/_sec_expit.qmd
@@ -4,6 +4,8 @@
 
 :::{#thm-prob-from-logodds}
 
+#### Probability as a function of log-odds
+
 ::: notes
 If $\prob$ is the probability of an event $A$,
 $\odds$ is the corresponding odds of $A$,
@@ -61,6 +63,7 @@ Details left to the reader.
 ---
 
 :::{#thm-expit-prob-logodds}
+#### Probability via the expit function
 If $\prob$ is the probability of an event $A$,
 $\odds$ is the corresponding odds of $A$,
 and $\logodds$ is the corresponding log-odds of $A$,
diff --git a/_subfiles/logistic-regression/_sec_invodds.qmd b/_subfiles/logistic-regression/_sec_invodds.qmd
index 79ebaeb0b..f4ece4df3 100644
--- a/_subfiles/logistic-regression/_sec_invodds.qmd
+++ b/_subfiles/logistic-regression/_sec_invodds.qmd
@@ -52,6 +52,8 @@ $$
 
 :::{#thm-odds-to-prob}
 
+#### Probability as a function of odds
+
 If $\pi$ is the probability of an event
 and $\omega$ is the corresponding odds of that event,
 then:
@@ -86,6 +88,8 @@ can be called the **inverse-odds function**.
 
 :::{#cor-invodds-pi}
 
+#### Probability via the inverse-odds function
+
 $$\prob = \invoddsf{\odds}$$
 :::
 
@@ -100,6 +104,8 @@ By @def-inv-odds and @thm-odds-to-prob.
 
 :::{#cor-invodds-odds-inv}
 
+#### Inverse-odds function inverts the odds function
+
 $$\invoddsf{\odds} = \oddsinvf{\odds}$$
 
 :::
@@ -252,6 +258,7 @@ $$
 ---
 
 :::{#cor-inverse-odds-nonevent}
+#### One plus odds in terms of non-event probability
 $$1+\odds = \frac{1}{1-\prob}$$
 :::
 
diff --git a/_subfiles/logistic-regression/_sec_logistic_score_fn.qmd b/_subfiles/logistic-regression/_sec_logistic_score_fn.qmd
index d75309c04..ff0e26fd4 100644
--- a/_subfiles/logistic-regression/_sec_logistic_score_fn.qmd
+++ b/_subfiles/logistic-regression/_sec_logistic_score_fn.qmd
@@ -4,6 +4,8 @@ As usual, by independence, we have:
 
 :::{#lem-score-logistic}
 
+#### Score function decomposes over observations
+
 $$
 \ba
 \brown{\vec{\llik'}(\vb)}
@@ -22,6 +24,8 @@ we can apply the [vector chain rule](math-prereqs.qmd#thm-chain-vec):
 
 :::{#lem-logistic-score-comp}
 
+#### Chain rule applied to the score component
+
 $$
 \ba
 \magenta{\vec{\llik_i'}(\vb)}
@@ -38,6 +42,8 @@ $$
 
 :::{#lem-d_logodds-d_vb}
 
+#### Derivative of log-odds with respect to coefficients
+
 By [the derivative of a linear combination](math-prereqs.qmd#thm-deriv-lincom):
 
 $$
@@ -90,6 +96,7 @@ $$
 
 
 :::{#thm-logistic-score-comp}
+#### Score component for one observation
 $$\magenta{\llik_i'(\vb)} = \magenta{\vx_i \err_i}$${#eq-score-comp}
 :::
 
@@ -106,6 +113,8 @@ we have:
 
 :::{#thm-logistic-score-fn}
 
+#### Logistic-model score function
+
 $$
 \ba
 \brown{\vec{\llik'}(\vb)} &= \sumin \magenta{\llik_i'(\vb)}\\
diff --git a/_subfiles/logistic-regression/_sec_logistic_slope_mean.qmd b/_subfiles/logistic-regression/_sec_logistic_slope_mean.qmd
index 7c5f26281..38f7fdf53 100644
--- a/_subfiles/logistic-regression/_sec_logistic_slope_mean.qmd
+++ b/_subfiles/logistic-regression/_sec_logistic_slope_mean.qmd
@@ -2,6 +2,8 @@
 
 :::{#lem-d_logodds-d_x}
 
+#### Derivative of log-odds w.r.t. predictor
+
 By [the derivative of a linear combination](math-prereqs.qmd#thm-deriv-lincom):
 
 $$
diff --git a/_subfiles/logistic-regression/_sec_logit.qmd b/_subfiles/logistic-regression/_sec_logit.qmd
index 11ba85288..b720bfc00 100644
--- a/_subfiles/logistic-regression/_sec_logit.qmd
+++ b/_subfiles/logistic-regression/_sec_logit.qmd
@@ -15,6 +15,8 @@ $$\logodds \eqdef \logf{\omega}$${#eq-def-logodds}
 
 :::{#thm-logodds-pi}
 
+#### Log-odds as a function of probability
+
 If $\prob$ is the probability of an event $A$,
 $\odds$ is the corresponding odds of $A$,
 and $\eta$ is the corresponding log-odds of $A$,
@@ -81,6 +83,7 @@ Apply @def-logit-fn and then @def-odds (details left to the reader).
 ---
 
 :::{#cor-logodds-logit}
+#### Log-odds via the logit function
 If $\prob$ is the probability of an event $A$
 and $\logodds$ is the corresponding log-odds of $A$,
 then:
diff --git a/_subfiles/logistic-regression/_sec_odds_fn.qmd b/_subfiles/logistic-regression/_sec_odds_fn.qmd
index e4913051a..b5e0fcd38 100644
--- a/_subfiles/logistic-regression/_sec_odds_fn.qmd
+++ b/_subfiles/logistic-regression/_sec_odds_fn.qmd
@@ -22,6 +22,7 @@ $$
 ---
 
 :::{#thm-prob-to-odds}
+#### Odds as a function of probability
 If $\prob$ is the probability of an event $A$
 and $\odds$ is the corresponding odds of $A$,
 then:
@@ -64,6 +65,7 @@ which is easier to remember and manipulate:
 :::
 
 :::{#cor-oddsf-to-odds}
+#### Odds via the odds function
 If $\prob$ is the probability of an outcome $A$
 and $\odds$ is the corresponding odds of $A$,
 then:
diff --git a/_subfiles/logistic-regression/_sec_odds_of_rare_events.qmd b/_subfiles/logistic-regression/_sec_odds_of_rare_events.qmd
index f69f49e97..71e44bde0 100644
--- a/_subfiles/logistic-regression/_sec_odds_of_rare_events.qmd
+++ b/_subfiles/logistic-regression/_sec_odds_of_rare_events.qmd
@@ -50,6 +50,8 @@ $$
 
 :::{#thm-odds-minus-probs}
 
+#### Difference between odds and probability
+
 Let $\odds = \frac{\pi}{1-\pi}$. Then:
 
 $$\odds - \pi = \frac{\pi^2}{1-\pi}$$
diff --git a/_subfiles/logistic-regression/_sec_overview_bernoulli_models.qmd b/_subfiles/logistic-regression/_sec_overview_bernoulli_models.qmd
index d89ee30fe..9588af8ee 100644
--- a/_subfiles/logistic-regression/_sec_overview_bernoulli_models.qmd
+++ b/_subfiles/logistic-regression/_sec_overview_bernoulli_models.qmd
@@ -11,6 +11,8 @@ What is logistic regression?
 :::{#sol-def-logistic-regression}
 
 :::{#def-logistic-regression}
+#### Logistic regression model
+
 **Logistic regression** is a framework for modeling [binary](data.qmd#def-binary) outcomes, conditional on one or more *predictors* (a.k.a. *covariates*).
 :::
 
diff --git a/_subfiles/logistic-regression/_thm-d_odds_d_beta.qmd b/_subfiles/logistic-regression/_thm-d_odds_d_beta.qmd
index a9feb13f5..c1f396e39 100644
--- a/_subfiles/logistic-regression/_thm-d_odds_d_beta.qmd
+++ b/_subfiles/logistic-regression/_thm-d_odds_d_beta.qmd
@@ -1,4 +1,5 @@
 :::{#thm-d_odds_d_beta}
+#### Gradient of odds w.r.t. coefficients
 
 ::: notes
 To derive $\derivf{\odds}{\vb}$,
@@ -19,6 +20,8 @@ $$
 
 :::{#cor-d_odds_d_beta}
 
+#### Gradient of odds w.r.t. coefficients in terms of probability
+
 $$
 \ba
 \derivf{\odds}{\vb}
diff --git a/_subfiles/logistic-regression/_thm-d_pi_d_beta.qmd b/_subfiles/logistic-regression/_thm-d_pi_d_beta.qmd
index 6b741e07d..8ec56cc7e 100644
--- a/_subfiles/logistic-regression/_thm-d_pi_d_beta.qmd
+++ b/_subfiles/logistic-regression/_thm-d_pi_d_beta.qmd
@@ -2,7 +2,9 @@
 
 :::{#thm-d_pi_d_beta}
 
-Using 
+#### Gradient of fitted probability w.r.t. coefficients
+
+Using
 @lem-d_logodds-d_vb and 
 @thm-d_prob-d_logodds:
 
diff --git a/_subfiles/logistic-regression/_thm_odds-from-logodds.qmd b/_subfiles/logistic-regression/_thm_odds-from-logodds.qmd
index 4edd4612d..cb07dcf1d 100644
--- a/_subfiles/logistic-regression/_thm_odds-from-logodds.qmd
+++ b/_subfiles/logistic-regression/_thm_odds-from-logodds.qmd
@@ -1,4 +1,5 @@
 :::{#lem-odds-from-logodds}
+#### Odds from log-odds
 
 ::: notes
 If $\odds$ is the odds of an event $A$
diff --git a/_subfiles/logistic-regression/_thms-deriv-odds.qmd b/_subfiles/logistic-regression/_thms-deriv-odds.qmd
index 609035c73..4920650a3 100644
--- a/_subfiles/logistic-regression/_thms-deriv-odds.qmd
+++ b/_subfiles/logistic-regression/_thms-deriv-odds.qmd
@@ -30,6 +30,8 @@ $$
 
 :::{#cor-deriv-odds}
 
+#### Derivative of odds function in terms of odds
+
 $$\derivf{\odds}{\prob} = \sqf{1+\odds}$$
 
 :::
diff --git a/_subfiles/misc/_cor-deriv-expit.qmd b/_subfiles/misc/_cor-deriv-expit.qmd
index c8b34efff..d66c684dc 100644
--- a/_subfiles/misc/_cor-deriv-expit.qmd
+++ b/_subfiles/misc/_cor-deriv-expit.qmd
@@ -1,3 +1,5 @@
 :::{#cor-deriv-expit}
+#### Derivative of expit
+
 $$\dexpitf{\logodds} = (\expitf{\logodds}) (1 - \expitf{\logodds})$$
 :::
diff --git a/_subfiles/misc/_cor-deriv-invodds.qmd b/_subfiles/misc/_cor-deriv-invodds.qmd
index 088d18a2e..e628dda1c 100644
--- a/_subfiles/misc/_cor-deriv-invodds.qmd
+++ b/_subfiles/misc/_cor-deriv-invodds.qmd
@@ -1,5 +1,7 @@
 
 :::{#cor-deriv-invodds}
 
+#### Derivative of inverse-odds function
+
 $$\doddsinvf{\odds} = \sqf{1 - \invoddsf{\odds}}$$
 :::
diff --git a/_subfiles/misc/_cor_prob-nonevent.qmd b/_subfiles/misc/_cor_prob-nonevent.qmd
index 42bdbd7bd..ca64e7198 100644
--- a/_subfiles/misc/_cor_prob-nonevent.qmd
+++ b/_subfiles/misc/_cor_prob-nonevent.qmd
@@ -1,5 +1,7 @@
 :::{#cor-inverse-odds-nonevent2}
 
+#### Probability of a non-event from the odds
+
 If $\prob$ is the probability of event $A$
 and $\odds$ is the corresponding odds of event $A$,
 then the probability that $A$ does not occur is:
diff --git a/_subfiles/misc/_lem-one-minus-expit.qmd b/_subfiles/misc/_lem-one-minus-expit.qmd
index 727192c81..d5b604fdf 100644
--- a/_subfiles/misc/_lem-one-minus-expit.qmd
+++ b/_subfiles/misc/_lem-one-minus-expit.qmd
@@ -1,4 +1,6 @@
 :::{#lem-one-minus-expit}
+#### One minus expit
+
 $$1-\expitf{\logodds} = \inv{1+\exp{\logodds}}$$
 :::
 
diff --git a/_subfiles/proportional-hazards-models/_cor-hazard-ratio-vs-baseline.qmd b/_subfiles/proportional-hazards-models/_cor-hazard-ratio-vs-baseline.qmd
index 1526b8836..4839ead39 100644
--- a/_subfiles/proportional-hazards-models/_cor-hazard-ratio-vs-baseline.qmd
+++ b/_subfiles/proportional-hazards-models/_cor-hazard-ratio-vs-baseline.qmd
@@ -1,5 +1,7 @@
 :::{#cor-hazard-ratio-vs-baseline}
 
+#### Hazard factor from difference of log-hazard from baseline
+
 $$\hazfactor(t|\vx)= \expf{\diffloghaz(t|\vx)}$$
 
 :::
diff --git a/_subfiles/proportional-hazards-models/_def-ph-model.qmd b/_subfiles/proportional-hazards-models/_def-ph-model.qmd
index 788af007a..ad389238a 100644
--- a/_subfiles/proportional-hazards-models/_def-ph-model.qmd
+++ b/_subfiles/proportional-hazards-models/_def-ph-model.qmd
@@ -27,6 +27,8 @@ Equivalently:
 
 :::{#lem-ph-lincomp}
 
+#### Log-hazard as baseline plus a linear combination
+
 In a proportional hazards model (that is, if @eq-ph-diffloghaz holds):
 
 $$
diff --git a/_subfiles/proportional-hazards-models/_sec-surv-conditional-hazards.qmd b/_subfiles/proportional-hazards-models/_sec-surv-conditional-hazards.qmd
index 5cd03c915..903f8dba0 100644
--- a/_subfiles/proportional-hazards-models/_sec-surv-conditional-hazards.qmd
+++ b/_subfiles/proportional-hazards-models/_sec-surv-conditional-hazards.qmd
@@ -92,6 +92,8 @@ here $\loghaz(t|\vx)$ depends on **both** $t$ **and** $\vx$.
 {{< slidebreak >}}
 
 :::{#thm-haz-from-loghaz}
+#### Hazard from log-hazard
+
 $$
 \ba
 \haz(t|\vx) &= \expf{\loghaz(t|\vx)}
@@ -132,6 +134,8 @@ $$
 
 :::{#cor-diffloghaz-log-HR}
 
+#### Difference of log-hazard from baseline equals log of the hazard factor
+
 $$\diffloghaz(t|\vx) = \logf{\hazfactor(t| \vx)}$$
 
 :::
diff --git a/_subfiles/proportional-hazards-models/_sec-understand-coxph.qmd b/_subfiles/proportional-hazards-models/_sec-understand-coxph.qmd
index 06d6c2a04..bb6b31eeb 100644
--- a/_subfiles/proportional-hazards-models/_sec-understand-coxph.qmd
+++ b/_subfiles/proportional-hazards-models/_sec-understand-coxph.qmd
@@ -22,7 +22,9 @@ we will indicate this dependence by extending our notation for hazard:
 
 :::{#lem-diffloghaz-ph}
 
-If $\loghaz(t|\vx) = \loghaz_0(t) + \reglincomb$, then: 
+#### Difference of log-hazards between two covariate patterns
+
+If $\loghaz(t|\vx) = \loghaz_0(t) + \reglincomb$, then:
 
 $$
 \ba
@@ -37,6 +39,8 @@ $$
 
 :::{#thm-hazard-ratio-ph}
 
+#### Hazard ratio under proportional hazards
+
 If $\loghaz(t|\vx) = \loghaz_0(t) + \reglincomb$, then:
 
 $$
@@ -90,6 +94,8 @@ $$\hr(t| \vx : \vxs)  = \hr(\vx : \vxs)$$
 {{< slidebreak >}}
 
 :::{#lem-ph-diffloghaz-0}
+#### Difference of log-hazard from baseline
+
 $$\diffloghaz(t|\vx)= \reglincomb$${#eq-diffloghaz-0-ph}
 :::
 
@@ -97,7 +103,9 @@ $$\diffloghaz(t|\vx)= \reglincomb$${#eq-diffloghaz-0-ph}
 
 :::{#thm-hazard-ratio-vs-baseline-ph}
 
-If $\loghaz(t|\vx) = \loghaz_0(t) + \reglincomb$, then: 
+#### Hazard ratio versus baseline under proportional hazards
+
+If $\loghaz(t|\vx) = \loghaz_0(t) + \reglincomb$, then:
 
 $$\hazfactor(t|\vx) = \expf{\reglincomb}$$
 
@@ -123,6 +131,8 @@ $$
 {{< slidebreak >}}
 
 :::{#thm-ph-haz-decomp}
+#### Proportional-hazards decomposition of the hazard
+
 $$\haz(t|\vx) = \haz_0(t)\hazfactor(\vx)$$
 :::
 
@@ -134,6 +144,8 @@ Also:
 
 :::{#thm-ph-also}
 
+#### Equivalent forms of the proportional-hazards model
+
 $$
 \ba
 \hazfactor(\vx) &= \expf{\diffloghaz(\vx)}
@@ -212,6 +224,8 @@ As we saw above, Cox's proportional hazards model has this property, with $\hr(\
 
 :::{#thm-haz-ratio-notations}
 
+#### Relating the hazard-ratio and hazard-factor notations
+
 ::: notes
 We are using two similar notations,
 $\hr(\vx,\vxs)$ and $\hazfactor(\vx)$.
@@ -246,6 +260,8 @@ $$
 Hence on the log scale, we have:
 
 :::{#thm-diff-loghaz-lincom}
+#### Difference of log-hazards is a linear combination
+
 $$
 \ba
 \logf{\frac{\haz(t|\vx)}{\haz(t|\vxs)}}
diff --git a/chapters/algebra.qmd b/chapters/algebra.qmd
index 98afa35ec..77c27b726 100644
--- a/chapters/algebra.qmd
+++ b/chapters/algebra.qmd
@@ -36,6 +36,7 @@ If $a = b$, then for any function $f(x)$, $f(a) = f(b)$
 ### Inequalities
 
 :::{#thm-add-ineq}
+#### Adding to both sides of an inequality
 
 If $a<b$, then $a+c < b+c$
 
@@ -52,6 +53,7 @@ If $a < b$, then: $-a > -b$
 ---
 
 :::{#thm-mult-ineq}
+#### Multiplying both sides of an inequality by a nonnegative number
 If $a < b$ and $c \geq 0$, then $ca < cb$.
 
 :::
@@ -59,6 +61,7 @@ If $a < b$ and $c \geq 0$, then $ca < cb$.
 ---
 
 :::{#thm-negative-one}
+#### Negation is multiplication by $-1$
 
 $$-a = (-1)*a$$
 
diff --git a/chapters/basic-statistical-methods.qmd b/chapters/basic-statistical-methods.qmd
index 3476ad07a..5eea73314 100644
--- a/chapters/basic-statistical-methods.qmd
+++ b/chapters/basic-statistical-methods.qmd
@@ -56,6 +56,7 @@ See @vittinghoff2e, §3.2.
 ### Sample mean
 
 ::: {#def-sample-mean}
+#### Sample mean
 
 The **sample mean** of $n$ observations $x_1, \ldots, x_n$ is:
 
@@ -65,6 +66,7 @@ $$\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i$$
 ### Sample variance
 
 ::: {#def-sample-variance}
+#### Sample variance
 
 The **sample variance** is:
 
@@ -78,6 +80,7 @@ of the population variance $\sigma^2$.
 ### Sample standard deviation
 
 ::: {#def-sample-sd}
+#### Sample standard deviation
 
 The **sample standard deviation** is $s = \sqrt{s^2}$.
 It is expressed in the same units as the original data,
@@ -87,6 +90,7 @@ making it more interpretable than the variance.
 ### Sample median
 
 ::: {#def-sample-median}
+#### Sample median
 
 The **sample median** is the middle value
 when observations are sorted in ascending order.
@@ -102,6 +106,7 @@ The median is more robust to outliers than the mean.
 ### Interquartile range
 
 ::: {#def-IQR}
+#### Interquartile range
 
 The **interquartile range (IQR)** is the difference between the 75th percentile
 (the third quartile, $Q_3$) and the 25th percentile
@@ -117,6 +122,7 @@ Like the median, the IQR is robust to outliers.
 ### Sample proportion
 
 ::: {#def-sample-proportion}
+#### Sample proportion
 
 For a binary outcome,
 the **sample proportion** of "successes" (coded as 1) is:
@@ -268,6 +274,7 @@ See @vittinghoff2e, §3.3.
 ### Null hypothesis
 
 ::: {#def-null-hypothesis}
+#### Null hypothesis
 
 The **null hypothesis** $H_0$ is a specific claim
 about the population parameter(s) that we test against the data.
@@ -280,6 +287,7 @@ $$H_0: \mu_1 = \mu_2$$
 ### Alternative hypothesis
 
 ::: {#def-alternative-hypothesis}
+#### Alternative hypothesis
 
 The **alternative hypothesis** $H_1$ (or $H_A$) is the claim
 we are trying to find evidence for.
@@ -293,6 +301,7 @@ $$H_1: \mu_1 \neq \mu_2$$
 ### Definition
 
 ::: {#def-two-sample-t-test}
+#### Two-sample t-test
 
 The **two-sample t-test** (Welch's t-test)
 tests whether the means of two independent groups are equal.
@@ -342,6 +351,7 @@ t.test(glucose_HT, glucose_placebo)
 ### Definition
 
 ::: {#def-one-sample-t-test}
+#### One-sample t-test
 
 The **one-sample t-test** tests whether the mean of a single population
 equals a specified null value $\mu_0$:
@@ -360,6 +370,7 @@ Under $H_0$, $t \sim t_{n-1}$ (a t-distribution with $n-1$ degrees of freedom).
 ### Definition
 
 ::: {#def-paired-t-test}
+#### Paired t-test
 
 The **paired t-test** compares two related measurements
 (e.g., pre- and post-treatment values from the same subjects).
@@ -402,6 +413,7 @@ to compare means across $k \geq 2$ groups.
 ## Definition
 
 ::: {#def-one-way-anova}
+#### One-way ANOVA
 
 In a **one-way ANOVA**, we test:
 
@@ -447,6 +459,7 @@ See @vittinghoff2e, §3.5.
 ### Definition
 
 ::: {#def-contingency-table}
+#### Contingency table
 
 A **contingency table** (cross-tabulation) displays
 the joint frequencies of two categorical variables.
@@ -490,6 +503,7 @@ hers |>
 ### Definition
 
 ::: {#def-chi-square-test}
+#### Chi-square test
 
 The **Pearson chi-square test** tests whether two categorical variables are independent.
 For a $2 \times 2$ table,
@@ -520,6 +534,7 @@ chisq.test(hers$exercise, hers$HT)
 ### Definition
 
 ::: {#def-fishers-exact}
+#### Fisher's exact test
 
 **Fisher's exact test** computes the exact probability of observing
 a $2 \times 2$ table at least as extreme as the observed table,
@@ -552,6 +567,7 @@ See @vittinghoff2e, §3.6.
 ### Definition
 
 ::: {#def-pearson-r}
+#### Pearson correlation coefficient
 
 The **Pearson correlation coefficient** measures the strength and direction
 of the linear association between two continuous variables $X$ and $Y$:
@@ -579,6 +595,7 @@ cor.test(hers$BMI, hers$glucose, method = "pearson")
 ### Definition
 
 ::: {#def-spearman-r}
+#### Spearman rank correlation
 
 The **Spearman rank correlation** $r_S$
 is the Pearson correlation computed on the *ranks* of the observations.
@@ -605,6 +622,7 @@ See @vittinghoff2e, §3.6 and [Linear Models Overview](Linear-models-overview.qm
 ### Definition
 
 ::: {#def-slr}
+#### Simple linear regression
 
 A **simple linear regression** model relates
 a continuous outcome $Y$ to a single predictor $X$:
@@ -657,6 +675,7 @@ for each 1 kg/m² increase in BMI.
 ### Definition
 
 ::: {#def-r-squared}
+#### Coefficient of determination ($R^2$)
 
 The **coefficient of determination** $R^2$ measures the proportion of the total
 variance in $Y$ that is explained by the linear regression on $X$:
diff --git a/chapters/negbinom.qmd b/chapters/negbinom.qmd
index 60468a442..ae82eafb5 100644
--- a/chapters/negbinom.qmd
+++ b/chapters/negbinom.qmd
@@ -26,6 +26,7 @@ which brings us back to the Poisson distribution.
 ---
 
 :::{#thm-nb}
+#### Mean and variance of the negative binomial distribution
 If $Y \sim \NegBin(\mu, \rho)$, then:
 
 - $\Expp[Y] = \mu$
diff --git a/chapters/parametric-survival-models.qmd b/chapters/parametric-survival-models.qmd
index be9107d53..1a6447f6f 100644
--- a/chapters/parametric-survival-models.qmd
+++ b/chapters/parametric-survival-models.qmd
@@ -124,6 +124,7 @@ ggplot() +
 ### Properties of Weibull hazard functions
 
 :::{#thm-weibull-props}
+#### Properties of Weibull hazard functions
 
 If $T$ has a Weibull distribution, then:
 
diff --git a/chapters/poisson.qmd b/chapters/poisson.qmd
index 40dce34b9..4eb8f7e21 100644
--- a/chapters/poisson.qmd
+++ b/chapters/poisson.qmd
@@ -292,6 +292,7 @@ Start from definition of event rate and use algebra to solve for $\mu$.
 ---
 
 ::: {#thm-non-exposed}
+#### No exposure means no expected events
 When the exposure magnitude is 0, there is no opportunity for events to occur:
 
 $$\Expp[Y|T=0] = 0$$
@@ -314,6 +315,7 @@ In other words, this model assumes that if there is no exposure, there can't be
 :::
 
 :::{#thm-exposure-log-scale}
+#### Exposure is additive on the log scale
 
 If $\mu = \lambda\cdot t$, then:
 
@@ -332,8 +334,9 @@ that term is called an **offset**.
 ---
 
 :::{#thm-sum-pois}
+#### Sum of independent Poisson random variables
 
-If $X$ and $Y$ are independent Poisson random variables with means 
+If $X$ and $Y$ are independent Poisson random variables with means
 $\mu_X$ and $\mu_Y$, their sum, $Z=X+Y$, is also a Poisson random variable, with mean 
 $\mu_Z = \mu_X + \mu_Y$.
 
diff --git a/chapters/probability.qmd b/chapters/probability.qmd
index 1d5b21c37..058ed327b 100644
--- a/chapters/probability.qmd
+++ b/chapters/probability.qmd
@@ -71,6 +71,7 @@ and underpins the @thm-total-prob for countable partitions.
 ---
 
 :::{#thm-prob-subset}
+#### Probability of a subset's intersection
 If $A$ and $B$ are statistical events and $A\subseteq B$, then $\Pr(A \cap B) = \Pr(A)$.
 :::
 
@@ -83,6 +84,7 @@ Left to the reader for now.
 ---
 
 :::{#thm-total-prob-1}
+#### An event and its complement sum to 1
 $$\Pr(A) + \Pr(\neg A) = 1$$
 :::
 
@@ -95,6 +97,7 @@ By properties 2 and 3 of @def-probability.
 ---
 
 :::{#cor-p-neg0}
+#### Complement rule
 $$\Pr(\neg A) = 1 - \Pr(A)$$
 :::
 
@@ -107,6 +110,7 @@ By @thm-total-prob-1 and algebra.
 ---
 
 :::{#cor-p-neg}
+#### Complement rule in probability ($\pi$) notation
 
 If the probability of an outcome $A$ is $\Pr(A)=\pi$,
 then the probability that $A$ does not occur is:
@@ -1431,6 +1435,7 @@ $$\Cov{X,Y} \eqdef \Expf{(X - \E X)(Y - \E Y)}$$
 ---
 
 :::{#thm-alt-cov}
+#### Alternative formula for covariance
 $$\Cov{X,Y}= \E{XY} - \E{X} \E{Y}$$
 :::
 
@@ -1643,6 +1648,7 @@ Left to the reader...
 ---
 
 :::{#cor-var-lincom2}
+#### Variance of a sum of two random variables
 
 For any two random variables $X$ and $Y$ and scalars $a$ and $b$: