d-morrison · d-morrison · May 16, 2026 · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -588,6 +588,10 @@ Key macros to use:
 - **Greek letters**: Use `\b` for $\beta$, `\g` for $\gamma$, `\a` for $\alpha$
 - **Formatting**: Use `\red{...}` and `\blue{...}` for colored text in math
 - **Deviation/error notation**: Use `\erf{...}` for deviations of estimates/estimators from their estimands; use `\devn(...)` for all other deviations (e.g., observations from population means)
+- **Estimators of vector estimands**: the estimator symbol (e.g. `\hat`,
+  `\bar`, `\tilde`) goes on top of the vector symbol, not inside it —
+  write `\hat{\v{\mu}}`, not `\v{\hat\mu}`. The hat sits on top of the
+  already-vectorized symbol.
 
 matrix-product helper macros:
 

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -50,6 +50,9 @@ Before committing any `.qmd`, `.R`, or config file change:
 - Key macros: `\E{Y|X=x}`, `\ba`/`\ea`, `\tp{v}`, `\b`, `\g`, `\a`, `\devn(...)`, `\erf{...}`
 - Include every intermediate step in derivations — do not skip steps
 - Color coding: `\red{...}` for focal/extra terms, `\blue{...}` for shared terms
+- Estimators of vector estimands: the estimator symbol (e.g. `\hat`) goes
+  on top of the vector symbol, not inside it — write `\hat{\v{\mu}}`, not
+  `\v{\hat\mu}`. (Same for `\bar`, `\tilde`, etc.)
 - Ratios vs. factors:
   - Use the generic `\ratio`/`\ratiof` macro when a ratio's inputs are the **quantities themselves** (the odds, hazards, rates, etc.) — e.g. `\ratio(\odds_1, \odds_2)`, **not** `\ror(\odds_1, \odds_2)` — because the type of ratio is clear from the inputs.
   - Use the type-subscripted ratio macros (`\ror` for odds ratios, `\hazratio`/`\hr` for hazard ratios, `\rateratio`, `\riskratio`, `\prevratio`, `\cuhazratio`, …) only when the inputs are **covariate patterns** (e.g. `\ror(\vx,\vxs)`, `\hr(t\mid\vx:\vxs)`), where the subscript is needed to say which kind of ratio it is.

diff --git a/_subfiles/count-regression/_exr-prac-glm-score.qmd b/_subfiles/count-regression/_exr-prac-glm-score.qmd
@@ -69,7 +69,12 @@ weighted by $x_i$.
 
 More generally,
 these score equations say that the **residuals $(y_i - \hat\mu_i)$
-are orthogonal to each predictor column**.
+are [orthogonal](math-prereqs.qmd#def-orthogonal-vectors)
+to each predictor column**:
+for each predictor $j$, the residual vector $(\vy - \hat{\v{\mu}})$ satisfies
+$\tp{\vx_{(j)}}(\vy - \hat{\v{\mu}}) = 0$,
+where $\vx_{(j)} = (x_{1j}, \ldots, x_{nj})$ is the column of $j$-th
+predictor values across observations.
 This is the GLM analogue of the OLS normal equations.
 
 :::
diff --git a/_subfiles/count-regression/_sec-overdispersion.qmd b/_subfiles/count-regression/_sec-overdispersion.qmd
@@ -13,8 +13,8 @@ In practice, many count distributions will have a variance substantially larger
 
 A random variable $X$ is **overdispersed**
 relative to a model $\p(X=x)$ if
-if its empirical variance in a dataset is larger than
-the value is predicted by the fitted model $\hat{\p}(X=x)$.
+its empirical variance in a dataset is larger than
+the value predicted by the fitted model $\hat{\p}(X=x)$.
 
 ::::
 

diff --git a/_subfiles/count-regression/_sec_pois-reg_intro.qmd b/_subfiles/count-regression/_sec_pois-reg_intro.qmd
@@ -69,7 +69,7 @@ $$
 In contrast with the other covariates (represented by $\vX$),
 $t$ enters this expression with a $\log{}$ transformation
 and without a corresponding $\beta$ coefficient;
-in other words, $\logf{t}$ is an [offset term](poisson.qmd#def-offset).
+in other words, $\logf{t}$ is an [offset term](probability.qmd#def-offset).
 :::
 
 ---

diff --git a/_subfiles/count-regression/_sec_poisson_dx.qmd b/_subfiles/count-regression/_sec_poisson_dx.qmd
@@ -17,17 +17,17 @@ where $h$ is the "leverage" (which we will continue to leave undefined).
 #### Deviance residuals
 
 $$
-d_k = \text{sign}(y - \hat y)\left\{\sqrt{2[\ell_{\text{full}}(y) - \ell(\hat\beta; y)]}\right\}
+d_k = \signt(y - \hat y)\left\{\sqrt{2[\ell_{\text{full}}(y) - \ell(\hat\beta; y)]}\right\}
 $$
 
 :::{.callout-note}
 
-$$\text{sign}(x) \eqdef \frac{x}{|x|}$$
+$$\signt(x) \eqdef \frac{x}{|x|}$$
 In other words:
 
-* $\text{sign}(x) = -1$ if $x < 0$
-* $\text{sign}(x) = 0$ if $x = 0$
-* $\text{sign}(x) = 1$ if $x > 0$
+* $\signt(x) = -1$ if $x < 0$
+* $\signt(x) = 0$ if $x = 0$
+* $\signt(x) = 1$ if $x > 0$
 
 ::::{.content-hidden}
 

diff --git a/_subfiles/count-regression/_sec_poisson_inference.qmd b/_subfiles/count-regression/_sec_poisson_inference.qmd
@@ -1,26 +1,55 @@
 
 ### Confidence intervals for regression coefficients and rate ratios
 
-As usual:
+A Wald 95% confidence interval for a single coefficient $\beta_j$ is:
 
 $$
-\beta \in \left[\ci \right]
+\beta_j \in \left[\hat\beta_j \pm \ciradf{\hat\beta_j}\right]
 $$
 
-Rate ratios: exponentiate CI endpoints
+where $z_{1-\alpha/2} \approx 1.96$ for $\alpha = 0.05$.
+
+Because the log-rate scale is related to the rate scale by exponentiation,
+we obtain a confidence interval for the rate ratio $e^{\beta_j}$
+by exponentiating both endpoints:
 
 $$
-\exp{\beta} \in \left[\exp{\ci} \right]
+e^{\beta_j} \in
+  \left[
+    \exp{\hat\beta_j - \ciradf{\hat\beta_j}},\;
+    \exp{\hat\beta_j + \ciradf{\hat\beta_j}}
+  \right]
 $$
 
 ### Hypothesis tests for regression coefficients
 
+To test $H_0: \beta_j = \beta_0$ against a one- or two-sided alternative,
+compute the Wald $z$-statistic:
+
 $$
-z = \frac{\hat \beta - \beta_0}{\hse{\hb}}
+z = \frac{\hat \beta_j - \beta_0}{\hse{\hat\beta_j}}
 $$
 
-Compare $z$ or $|z|$ to the tails of the standard Gaussian distribution, according to the null hypothesis.
+and compare $z$ (one-sided) or $|z|$ (two-sided) to the tails of the
+standard Gaussian distribution.
+The most common null hypothesis is $H_0: \beta_j = 0$,
+i.e., that covariate $j$ has no association with the outcome rate.
 
 ### Comparing nested models
 
-log(likelihood ratio) tests, as usual.
+To compare a smaller model $M_0$ (with $p_0$ parameters) to a larger model
+$M_1$ (with $p_1 > p_0$ parameters), use the likelihood ratio test statistic:
+
+$$
+G^2 = 2\bigl[\hat\ell_1 - \hat\ell_0\bigr]
+$$
+
+where $\hat\ell_1$ and $\hat\ell_0$ are the maximized log-likelihoods
+of $M_1$ and $M_0$ respectively.
+(Here the subscripts index the two *models*; they are unrelated to the
+scalar null value $\beta_0$ used in the Wald test above.)
+
+Under $H_0$ that the additional $p_1 - p_0$ parameters are all zero,
+$G^2 \dsim \chi^2_{p_1 - p_0}$.
+
+See @dobson4e [Chapter 9] and @vittinghoff2e [§8.1] for details.
diff --git a/_subfiles/count-regression/_sec_zero-inflation.qmd b/_subfiles/count-regression/_sec_zero-inflation.qmd
@@ -31,12 +31,105 @@ $$
 ---
 
 ::: {#exr-zinf-pmf}
-Expand $P(Y=0|X=x,T=t)$, $P(Y=1|X=x,T=t)$ and $P(Y=y|X=x,T=t)$ into expressions involving $P(Z=1|X=x,T=t)$ and $P(Y=y|Z=0,X=x,T=t)$.
+Expand $P(Y=0|X=x,T=t)$, $P(Y=1|X=x,T=t)$ and $P(Y=y|X=x,T=t)$ into expressions involving $P(Z=1|X=x)$ and $P(Y=y|Z=0,X=x,T=t)$.
 :::
 
----
+::: {.solution}
+
+Let $\pi = \P(Z=1|X=x)$ and $\mu_0 = \Expp[Y|Z=0,X=x,T=t]$.
+
+**$P(Y=0)$:** $Y=0$ occurs either because $Z=1$ (always zero)
+or because $Z=0$ and the Poisson draw equals 0:
+
+$$
+\ba
+\P(Y=0|X=x,T=t)
+&= \P(Z=1|X=x) + \P(Z=0|X=x)\,\P(Y=0|Z=0,X=x,T=t)\\
+&= \pi + (1-\pi)\,e^{-\mu_0}
+\ea
+$$
+
+**$P(Y=1)$:** $Z=1$ can never produce $Y=1$, so:
+
+$$
+\P(Y=1|X=x,T=t)
+= (1-\pi)\,\P(Y=1|Z=0,X=x,T=t)
+= (1-\pi)\,\mu_0 e^{-\mu_0}
+$$
+
+**$P(Y=y)$ for $y \geq 1$:** Identical reasoning gives
+
+$$
+\P(Y=y|X=x,T=t)
+= (1-\pi)\,\frac{\mu_0^y e^{-\mu_0}}{y!}
+$$
+
+:::
+
+{{< slidebreak >}}
 
 ::: {#exr-zinf-moments}
 
-Derive the expected value and variance of $Y$, conditional on $X$ and $T$, as functions of $P(Z=1|X=x,T=t)$ and $\Expp[Y|Z=0,X=x,T=t]$.
+Derive the expected value and variance of $Y$, conditional on $X$ and $T$, as functions of $\pi = P(Z=1|X=x)$ and $\mu_0 = \Expp[Y|Z=0,X=x,T=t]$.
+:::
+
+::: {.solution}
+
+Let $\pi = \P(Z=1|X=x)$ and $\mu_0 = \Expp[Y|Z=0,X=x,T=t]$.
+
+**Expected value.** By the Law of Total Expectation
+(conditioning on $Z$, within the subpopulation $\{X=x, T=t\}$):
+
+$$
+\ba
+\Expp[Y|X=x,T=t]
+&= \Expp[Y|Z=1,X=x,T=t]\,\pi + \Expp[Y|Z=0,X=x,T=t]\,(1-\pi)\\
+&= 0 \cdot \pi + \mu_0(1-\pi)\\
+&= (1-\pi)\,\mu_0
+\ea
+$$
+
+The substitution $\Expp[Y|Z=0,X=x,T=t] = \mu_0$ follows immediately
+from the definition of $\mu_0$ above.
+
+**Variance.** By the Law of Total Variance.
+To reduce clutter, we suppress the $(X=x, T=t)$ conditioning in the
+intermediate steps below: every expectation and variance is taken within
+the subpopulation $\{X=x, T=t\}$, and we restore the explicit conditioning
+in the final line.
+
+$$
+\Var{Y} = \Expp[\Var{Y|Z}] + \Var{\Expp[Y|Z]}
+$$
+
+For the first term, since $\Var{Y|Z=1}=0$ and $\Var{Y|Z=0}=\mu_0$ (Poisson):
+
+$$
+\Expp[\Var{Y|Z}] = 0 \cdot \pi + \mu_0(1-\pi) = (1-\pi)\mu_0
+$$
+
+For the second term, $\Expp[Y|Z]$ takes the value 0 (with prob $\pi$)
+or $\mu_0$ (with prob $1-\pi$), so:
+
+$$
+\ba
+\Var{\Expp[Y|Z]}
+&= \pi(0 - (1-\pi)\mu_0)^2 + (1-\pi)(\mu_0 - (1-\pi)\mu_0)^2\\
+&= \pi(1-\pi)^2\mu_0^2 + (1-\pi)\pi^2\mu_0^2\\
+&= \pi(1-\pi)\mu_0^2[\,(1-\pi)+\pi\,]\\
+&= \pi(1-\pi)\mu_0^2
+\ea
+$$
+
+Combining:
+
+$$
+\Var{Y|X=x,T=t} = (1-\pi)\mu_0 + \pi(1-\pi)\mu_0^2
+= (1-\pi)\mu_0\bigl(1 + \pi\mu_0\bigr)
+$$
+
+Since $(1-\pi)\mu_0\bigl(1+\pi\mu_0\bigr) \geq (1-\pi)\mu_0 = \Expp[Y|X=x,T=t]$ for any $\pi > 0$,
+zero-inflated count models always exhibit overdispersion relative to a Poisson model
+with the same mean.
+
 :::
diff --git a/chapters/count-regression.qmd b/chapters/count-regression.qmd
@@ -72,13 +72,17 @@ This content is adapted from:
 
 ::: notes
 There are alternatives to the Poisson model.
-Most notably, 
+Most notably,
 the [negative binomial model](probability.qmd#sec-nb-dist).
+:::
 
-We can still model $\mu$ as a function of $X$ and $T$ as before, 
-and we can combine this model with zero-inflation 
+The [negative binomial distribution](probability.qmd#sec-nb-dist)
+is a common alternative to the Poisson distribution for count outcomes.
+It adds a dispersion parameter that allows the variance to exceed the mean,
+making it more flexible when overdispersion is present.
+We can still model $\mu$ as a function of $X$ and $T$ as before,
+and we can combine this model with zero-inflation
 (as the conditional distribution for the non-zero component).
-:::
 
 ---
 
@@ -88,7 +92,13 @@ and we can combine this model with zero-inflation
 
 ## Quasipoisson
 
-An alternative to Negative binomial is the "quasipoisson" distribution. I've never used it, but it seems to be a method-of-moments type approach rather than maximum likelihood. It models the variance as $\Var{Y} = \mu\theta$, and estimates $\theta$ accordingly.
+An alternative to the negative binomial model is the quasi-Poisson approach.
+Rather than specifying a full probability distribution,
+it uses a method-of-moments approach rather than maximum likelihood estimation.
+It models the variance as $\Var{Y} = \mu\theta$
+and estimates the dispersion parameter $\theta$ accordingly.
+This approach is simpler to implement but provides less information
+than the full negative binomial likelihood.
 
 See `?quasipoisson` in R for more.
 

diff --git a/chapters/exr-needle-sharing-extensions.qmd b/chapters/exr-needle-sharing-extensions.qmd
@@ -1,45 +1,47 @@
-
 ```{r}
-library(MASS) #need this for glm.nb()
-glm1.nb = glm.nb(
-  data = needles,
-  shared_syr ~ age + sex + homeless*polydrug
+library(MASS) # for glm.nb()
+glm1_nb <- glm.nb(
+  formula = shared_syr ~ homeless +
+    age + sex + ethn + polydrug + hivstat + sexabuse + hplsns,
+  data = needles
 )
 
-equatiomatic::extract_eq(glm1.nb)
+equatiomatic::extract_eq(glm1_nb)
 ```
 
 ```{r}
 #| tbl-cap: "Negative binomial model for needle-sharing data"
 #| label: tbl-needles-nb
-summary(glm1.nb)
+summary(glm1_nb)
 ```
 
 ---
 
 ```{r}
 #| tbl-cap: "Poisson versus Negative Binomial Regression coefficient estimates"
 #| label: tbl-compare-poisson-nb
-tibble(name = names(coef(glm1)), poisson = coef(glm1), nb = coef(glm1.nb))
+tibble(name = names(coef(glm1)), poisson = coef(glm1), nb = coef(glm1_nb))
 ```
 
 #### zero-inflation
 
 ```{r}
-#| tbl-cap: "Zero-inflated poisson model"
+#| tbl-cap: "Zero-inflated Poisson model"
 #| label: tbl-zeroinf-poisson
 library(glmmTMB)
-zinf_fit1 = glmmTMB(
+zinf_pois <- glmmTMB(
   family = "poisson",
-  data  = needles,
-  formula = shared_syr ~ age + sex + homeless*polydrug,
-  ziformula = ~ age + sex + homeless + polydrug # fit won't converge with interaction
+  data = needles,
+  formula = shared_syr ~ homeless +
+    age + sex + ethn + polydrug + hivstat + sexabuse + hplsns,
+  # the zero-inflation submodel uses the exposure, matching Vittinghoff's
+  # `inflate(i.homeless)` (@vittinghoff2e, §8.3.1)
+  ziformula = ~ homeless
 )
 
-zinf_fit1 |>
+zinf_pois |>
   parameters(exponentiate = TRUE) |>
   print_md()
-
 ```
 
 ::: notes
@@ -53,18 +55,30 @@ Another R package for zero-inflated models is [`pscl`](https://cran.r-project.or
 ```{r}
 #| tbl-cap: "Zero-inflated negative binomial model"
 #| label: tbl-zeroinf-nb
-library(glmmTMB)
-zinf_fit1 = glmmTMB(
+zinf_nb <- glmmTMB(
   family = nbinom2,
-  data  = needles,
-  formula = shared_syr ~ age + sex + homeless*polydrug,
-  ziformula = ~ age + sex + homeless + polydrug 
-  # fit won't converge with interaction
+  data = needles,
+  formula = shared_syr ~ homeless +
+    age + sex + ethn + polydrug + hivstat + sexabuse + hplsns,
+  ziformula = ~ homeless
 )
 
-zinf_fit1 |>
+zinf_nb |>
   parameters(exponentiate = TRUE) |>
   print_md()
-
 ```
 
+::: notes
+Both zero-inflated models keep the zero-inflation (structural-zero) submodel
+parsimonious — just the exposure, `ziformula = ~ homeless` —
+matching @vittinghoff2e's `inflate(i.homeless)` [§8.3.1].
+
+A richer zero-inflation submodel is poorly identified here.
+With the negative binomial in particular,
+the overdispersion parameter and the zero-inflation component
+both compete to explain the excess zeros,
+so adding several covariates to the submodel makes the logistic fit
+*separate*: its coefficients diverge and the exponentiated estimates
+(odds ratios) run off to $\pm\infty$.
+Keeping the submodel small avoids that.
+:::