revise count regression chapter#860
Conversation
- Fix duplicate "if" and "value is predicted" typos in overdispersion definition
- Replace \mathbb{E} with \Expp macro in rate-ratio derivation for consistency
- Replace \text{sign} with \signt macro in deviance residual formula and callout
- Rewrite first-person language in Quasipoisson section
- Move NB models explanatory content out of notes-only block so it renders in HTML/PDF
- Add worked solutions for zero-inflation PMF and E/Var exercises
- Flesh out inference section with explicit CI, hypothesis test, and LRT formulas
Co-authored-by: Douglas Ezra Morrison <d-morrison@users.noreply.github.com>
|
Claude Code CLI (local session) is resolving the merge conflicts with main here. Paws off until done. |
…0516-0652 # Conflicts: # _subfiles/count-regression/_sec_poisson_RRs.qmd Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
|
Merge conflicts resolved by merging |
|
Claude finished @d-morrison's task in 3m 49s —— View job Code Review: PR #860 — Revise count regression chapterTasks
Overall this is a clean, well-targeted PR. The typo fixes, macro substitutions, language rewrites, and worked solutions are all correct. A few small issues below.
|
fig-needles plotted y = shsyryn (binary yes/no indicator) but its caption reads "Rates of needle sharing" and the Poisson model below it fits shared_syr ~ ... + offset(log(nivdu)) — i.e., the rate of shared syringes per injection. Switch to y = shared_syr / nivdu (rate) with nivdu > 0 filter to avoid divide-by-zero, and label the y-axis. fig-needles-pois used #| tbl-cap with a fig- label; the chunk produces a plot, so swap to fig-cap to keep the cross-reference type consistent. https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
|
Claude finished @d-morrison's task in 7m 33s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
Overall the PR is well-targeted: typo fixes, macro substitutions, language rewrites, and worked solutions are all correct. Four issues found. 1.
|
Data dictionary improvements: - Add labels for the nine variables that previously had no label (id, age, dprsn_dx, sexabuse, shared_syr, hplsns, nivdu, sqrtnivd, shsyr), pulling descriptions from the rmb package's documentation (https://d-morrison.github.io/rmb/man/needle_sharing.html). - Fix typo on sqrtninj: "sqrt(No. of infections in 30 days)" → "sqrt(No. of injections in 30 days)" (verified against .dta values; sqrtninj is the square root of nivdu, the injection count). - Tighten and standardise wording on the other labels. - Add a short prose blurb naming the dataset source (RMB2e chapter 8, UCSF companion website) and sample size (n = 128, 17 variables). New demographics table (tbl-needles-demographics): - Reports N, age (median, Q1–Q3), sex, ethnicity, HIV status, homelessness, polydrug use, and the two count variables (injections and shared syringes per 30 days), all on the analysis sample (Trans-sex singleton excluded). - Verified locally: n = 127 after Trans exclusion, age median 41 (IQR 35-48), 24% female, 59% White, 7% HIV+, 50% homeless. https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
|
Claude finished @d-morrison's task in 4m 57s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
The two latest commits ( Still open:
|
…ebreaks, dup label
_sec_poisson_inference.qmd:
- Make the coefficient-CI formula's subscripts internally consistent: expand
\ci to \hat\beta_j \pm \ciradf{\hat\beta_j} (matching the \beta_j on the
LHS) instead of using the unsubscripted \ci macro.
- Rewrite the rate-ratio CI as two explicit endpoints
[\exp{\hat\beta_j - rad}, \exp{\hat\beta_j + rad}] — the prose says
"exponentiating both endpoints" but \exp{\ci} expands to a single
exp{...} expression with the ± inside the braces, not two endpoints.
- Switch \dot\sim to the project's \dsim macro on the LRT line so the
spacing matches the rest of the codebase.
_sec_zero-inflation.qmd:
- exr-zinf-moments exercise text defined π conditional on T, but the
model definition and the solution both define π conditional only on X.
Drop T=t from the exercise prompt so a student's derivation cannot
legitimately diverge from the solution.
- Replace the new --- slide separator above the exercise with
{{< slidebreak >}} per CLAUDE.md.
exr-needle-sharing.qmd:
- Replace the --- I added between the new demographics table and the raw
data table with {{< slidebreak >}}.
- Clarify the sqrtninj label: sqrtninj and sqrtnivd are numerically
identical columns in the .dta (verified: zero rows differ across all
128), so note the duplication instead of giving the two columns
identical descriptions.
https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
|
Claude encountered an error after 3m 45s —— View job I'll analyze this and get back to you. |
|
Claude finished @d-morrison's task in 3m 7s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
Commit Resolved
One remaining issue
The model definition (line 6) specifies A student following the exercise text literally would carry T through their derivation for the π component and arrive at an expression different from the solution. The prior review at 08:24:52Z explicitly flagged both lines 34 and 73 for this fix; line 73 was addressed but line 34 was not. Fix: On line 34, change Everything else looks good
|
The needle-sharing data has five ethnicity categories with n = 1 or 2 participants (Asian, Filipino, Indian, "Indian & White", "White & Hispa"), which makes the data dictionary and demographics tables noisy and would make any ethn-stratified model have empty/near-empty cells. Collapse those into "Other" via forcats::fct_lump_min(min = 5). Verified: post-lump ethn has 4 levels — White (76), AA (36), Hispanic (10), Other (6). Lump correctly bundles the five rare categories. https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
|
Claude finished @d-morrison's task in 2m 28s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
Commit Still open:
|
sjPlot::plot_model(type="pred") doesn't expose the offset, so its y-axis was on the *count* scale and didn't match @fig-needles (which plots shared_syr / nivdu). Replace with a direct prediction grid where nivdu = 1 forces log(nivdu) = 0 in the offset; predict(type="response") then returns the Poisson rate per injection. Overlay observed shared_syr / nivdu points (sized by nivdu) and the fitted rate lines, faceted by homeless × polydrug, coloured by sex — same panel structure as @fig-needles. Verified locally: pred_grid 400 rows, rates 4.5e-9 to 0.066; observed data 121 complete-case rows after dropping NAs in homeless/polydrug/ sex/age (also removes the previous stray "homeless: NA" facet). https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
|
Code Review: PR #860 — Revise count regression chapter Tasks
|
exr-prac-glm-score's solution invokes orthogonality of the residual vector against each predictor column. def-orthogonal-vectors already exists in the math-prereqs linear-algebra section and matches the usage exactly, so cross-link to it and add the explicit inner-product equation so a student following the link back can verify the claim. Also shorten fig-needles-pois's fig-cap below the 80-char limit. https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
|
Claude finished @d-morrison's task in 4m 43s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
Six of the seven previously-reported issues remain resolved from commit Still open:
|
|
@claude address reviews |
|
👀 Picked up by workflow run #26835004505. R/Quarto/renv setup runs first (~3-5 min); Claude itself responds after that. |
|
@claude add a graph of sharedsyr vs nivdu |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Shows the raw count relationship between shared syringes and injections, complementing fig-needles (which shows the rate shared_syr/nivdu). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
👀 Picked up by workflow run #26835077308. R/Quarto/renv setup runs first (~3-5 min); Claude itself responds after that. |
|
No new comments. The request is already fully addressed. Here's what happened and the current state of the PR: The graph of #| fig-cap: "Shared syringes vs. number of injections (raw counts)"
#| label: fig-needles-raw
needles |>
ggplot(aes(x = nivdu, y = shared_syr, shape = sex, col = ethn)) +
geom_point(alpha = .5) +
facet_grid(cols = vars(polydrug), rows = vars(homeless)) +
labs(
x = "Number of injections (in 30 days)",
y = "Number of shared syringes (in 30 days)"
) +
theme(legend.position = "bottom")That session also fixed the last outstanding review finding: the — posted by @claude post-step from workflow run |
|
Claude Code CLI (local session) is working on this — paws off until I'm done. |
- fig-needles-raw: add dashed x=y line (shared syringes can't exceed injections) and wrap the bottom legends onto multiple rows so they don't get clipped - add fig-needles-dag: hypothesized causal structure (demographics + social factors -> nivdu -> shared_syr) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
|
Code Review: PR #860 — Revise count regression chapter Tasks
|
Per the latest @claude review's four findings: 1. Intro covariate list (line ~55): add HIV serostatus alongside age, sex, housing status, and drug-use patterns; hivstat is in glm1 now and was conspicuously missing from the overview sentence. 2. Stale "simplified DAG" note (line ~187): the previous note said a fuller model could add HIV status, depression, hopelessness — but the DAG already shows all three (and glm1 includes hivstat). Replaced with a note that explains which DAG nodes the fitted model does and doesn't condition on, and points to the mental-health nodes as an extension exercise. 3. Sex IRR paragraph (line ~426): drop the hard-coded adjustment list ("age, housing status, and drug-use pattern") and use the generic "the other covariates" phrasing the homelessness and polydrug paragraphs already use — now consistent and won't drift if the covariate set changes again. 4. Zero-inflation ziformulas (extensions, lines 36 and 61): drop hivstat from the zero-inflation submodel — with only ~8/128 HIV+ participants this risks the same convergence failure the prior code explicitly avoided for the homeless*polydrug interaction. Keep hivstat in the count formula and add a comment recording the rationale. https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
|
Claude finished @d-morrison's task in 3m 19s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
Commit All four previously-open issues resolved in
|
| # | Issue | Status |
|---|---|---|
| 1 | \exp{\ci} renders as single expression, not two endpoints |
✓ |
| 2 | Subscript mismatch \beta_j \in [\ci] |
✓ |
| 3 | \dsim macro instead of raw \dot\sim |
✓ |
| 4 | exr-zinf-moments T-conditioning (line 73) |
✓ |
| 5 | exr-zinf-pmf T-conditioning (line 34) |
✓ |
| 6 | {{< slidebreak >}} in _sec_zero-inflation.qmd |
✓ |
| 7 | {{< slidebreak >}} in exr-needle-sharing.qmd (×2) |
✓ |
| 8 | sqrtninj duplicate label clarified |
✓ |
| 9 | (\vy - \hat{\v{\mu}}) hat-atop-vector notation |
✓ |
| 10 | LTE inner expectations carry (X=x,T=t) explicitly |
✓ |
| 11 | β₀ notation collision between Wald/LRT sections | ✓ |
| 12 | μ vs μ₀ inconsistency across adjacent solutions | ✓ |
| 13 | fig-needles-raw div format |
✓ |
| 14 | offset(log(nivdu)) inside formula |
✓ |
| 15 | Overdispersion conclusion uses specific-value conditioning | ✓ |
| 16 | Vector estimator notation rule documented in CLAUDE.md | ✓ |
| 17 | Two raw --- slide breaks (commit ca62297) |
✓ |
| 18 | tbl-needles-count div format |
✓ |
| 19 | Unreferenced tables | ✓ |
| 20 | shsyr label clarified |
✓ |
| 21 | fig-needles rate vs count narrative inconsistency |
✓ |
| 22 | Uncited Introduction claims | ✓ |
| 23 | Render-breaking tbl-compare-poisson-nb tibble mismatch |
✓ |
| 24 | Introduction omits hivstat from covariate list |
✓ (commit a95e486) |
| 25 | Stale ::: notes text contradicts expanded DAG |
✓ (commit a95e486) |
| 26 | Sex IRR paragraph names adjustment set without hivstat |
✓ (commit a95e486) |
| 27 | hivstat in ziformula — convergence risk |
✓ (commit a95e486) |
Everything looks good
- Overdispersion typo fixes ("if if", "value is predicted") correct.
-
\signtand\Exppmacro substitutions correct. - First-person rewrite in Quasipoisson section reads well; NB content moved out of
::: notescorrect. - Zero-inflation PMF derivation (
$P(Y=0|X,T) = \pi + (1-\pi)e^{-\mu_0}$ ) and Law of Total Variance derivation ($\text{Var}[Y|X=x,T=t] = (1-\pi)\mu_0(1+\pi\mu_0)$) both mathematically correct throughout. - LRT formula:
$G^2 = 2[\hat\ell_1 - \hat\ell_0]$ with\dsim, M₁/M₀ naming, and parenthetical distinguishing model-index subscripts from the scalar Wald null β₀. Correct. - Rate-ratio CI uses two explicit endpoints; log-scale CI uses
±shorthand — both internally consistent. -
glm1formula (shared_syr ~ age + sex + homeless + polydrug + hivstat, no offset) consistent with Vittinghoff §8.3.1 and mediator argument. ✓ -
glm1.nbformula now matchesglm1(both havehivstat);tbl-compare-poisson-nbtibble lengths equal. ✓ -
ziformula = ~ age + sex + homeless + polydrug(nohivstat) in both ZIP and ZINB fits; explanatory comments present. ✓ -
pred_gridincludeshivstat = levels(hivstat)for full prediction surface;predict(type="response")returns counts directly. ✓ -
fct_lump_min(min = 5)andrelevel(ref = "White")correctly ordered. -
demog_data(gtsummary::tbl_summary) scoped before Trans removal; demographics table caption notes "all participants". ✓ - DAG structure expanded to all 9 covariate nodes; caption and
::: notesblock accurately describe which nodes the model does and doesn't condition on. ✓ -
\hat{\v{\mu}}(hat atop tilde-vector symbol) correct per documented convention in CLAUDE.md. -
@tsui2009hcvcorrectly hedged as "possible (though unconfirmed)"; BibTeX entry complete. ✓ - All
{{< slidebreak >}}shortcodes in new content correct; all remaining---markers in the file are pre-existing.
|
Claude Code CLI (local session) is working on this — paws off until I'm done. |
- Migrate def-offset into the rendered count chapter and link it with @def-offset (the old poisson.qmd#def-offset target is an orphan chapter that never renders, so the link was dead). - tbl-zeroinf-nb: reduce the zero-inflation submodel to ziformula = ~ 1. With all four covariates the NB overdispersion and the zero-inflation component compete to explain the zeros, the logistic submodel separates, and the odds ratios run off to +/-Inf. The DAG also gives no covariates to a structural-zero class. Add a note explaining both reasons. - Put shared_syr (count) y-axes on a pseudo_log scale in fig-needles, fig-needles-raw, and fig-needles-pois (handles the many zeros). For fig-needles-raw, transform both axes so the y = x reference line stays straight. - Un-nest aes() out of ggplot() in those three figures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
|
Claude finished @d-morrison's task in 3m 55s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
All 27 prior findings remain resolved. Commit 1. DAG argument in ZINB
|
| # | Issue | Status |
|---|---|---|
| 1 | \exp{\ci} renders as single expression, not two endpoints |
✓ |
| 2 | Subscript mismatch \beta_j \in [\ci] |
✓ |
| 3 | \dsim macro instead of raw \dot\sim |
✓ |
| 4–5 | T-conditioning in zero-inflation exercises | ✓ |
| 6–7 | {{< slidebreak >}} in both files |
✓ |
| 8 | sqrtninj duplicate label clarified |
✓ |
| 9 | \hat{\v{\mu}} hat-atop-vector notation |
✓ |
| 10 | LTE inner expectations carry (X=x,T=t) |
✓ |
| 11 | β₀ notation collision Wald/LRT | ✓ |
| 12 | μ vs μ₀ inconsistency | ✓ |
| 13 | fig-needles-raw div format |
✓ |
| 14 | offset inside formula | ✓ |
| 15 | Overdispersion conclusion conditioning | ✓ |
| 16 | Vector estimator convention in CLAUDE.md | ✓ |
| 17–18 | Slide breaks and tbl-needles-count div format |
✓ |
| 19 | Unreferenced tables | ✓ |
| 20 | shsyr label clarified |
✓ |
| 21 | fig-needles rate vs count narrative inconsistency |
✓ |
| 22 | Uncited Introduction claims | ✓ |
| 23 | tbl-compare-poisson-nb tibble mismatch |
✓ |
| 24 | Introduction omits hivstat from covariate list |
✓ |
| 25 | Stale ::: notes text contradicts expanded DAG |
✓ |
| 26 | Sex IRR paragraph names adjustment set without hivstat |
✓ |
| 27 | hivstat in ZINB ziformula — convergence risk |
✓ |
Rebuild the needle-sharing DAG to depict structure among the covariates: - Birth-fixed variables (age, sex, ethn) stay exogenous. - The six time-varying variables (homeless, polydrug, hivstat, sexabuse, dprsn_dx, hplsns) are joined pairwise by bidirected edges (<->), denoting associations whose causal direction is unknown. Lay the time-varying nodes on a hexagon and render with ggdag so the bidirected web is legible; explain the bidirected edges in the caption and notes. Add "bidirected" and "leftrightarrow" to WORDLIST. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
|
Claude finished @d-morrison's task in 7m 18s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
All 27 findings from prior rounds remain resolved. Commit Still open from prior round:
|
| # | Issue | Status |
|---|---|---|
| 1 | \exp{\ci} renders as single expression, not two endpoints |
✓ |
| 2 | Subscript mismatch \beta_j \in [\ci] |
✓ |
| 3 | \dsim macro instead of raw \dot\sim |
✓ |
| 4–5 | T-conditioning in zero-inflation exercises | ✓ |
| 6–7 | {{< slidebreak >}} in both files |
✓ |
| 8 | sqrtninj duplicate label clarified |
✓ |
| 9 | \hat{\v{\mu}} hat-atop-vector notation |
✓ |
| 10 | LTE inner expectations carry (X=x,T=t) |
✓ |
| 11 | β₀ notation collision Wald/LRT | ✓ |
| 12 | μ vs μ₀ inconsistency | ✓ |
| 13 | fig-needles-raw div format |
✓ |
| 14 | offset(log(nivdu)) inside formula |
✓ |
| 15 | Overdispersion conclusion conditioning | ✓ |
| 16 | Vector estimator convention in CLAUDE.md | ✓ |
| 17–18 | Slide breaks and tbl-needles-count div format |
✓ |
| 19 | Unreferenced tables | ✓ |
| 20 | shsyr label clarified |
✓ |
| 21 | fig-needles rate vs count narrative inconsistency |
✓ |
| 22 | Uncited Introduction claims | ✓ |
| 23 | tbl-compare-poisson-nb tibble mismatch |
✓ |
| 24 | Introduction omits hivstat from covariate list |
✓ |
| 25 | Stale ::: notes text contradicts expanded DAG |
✓ |
| 26 | Sex IRR paragraph names adjustment set without hivstat |
✓ |
| 27 | hivstat in ZINB ziformula — convergence risk |
✓ |
|
@claude review again — addressed: dropped the zero-inflation covariates in |
|
👀 Picked up by workflow run #26909201792. R/Quarto/renv setup runs first (~3-5 min); Claude itself responds after that. |
- The offset reference pointed at poisson.qmd#def-offset, but poisson.qmd is
include-only (pulled into probability.qmd) and has no standalone page. Point
the link at probability.qmd#def-offset, where the existing def-offset anchor
actually renders. (Earlier I had added a local def-offset definition; that
duplicated the anchor, so this replaces it with the corrected link.)
- Brace-protect the C in "hepatitis {C}" in the tsui2009hcv title so
sentence-case citation styles don't lowercase it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
|
Claude finished @d-morrison's task in 6m 56s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
All 27 prior findings remain resolved. The latest commit ( Latest commit (
|
| # | Issue | Status |
|---|---|---|
| 1 | \exp{\ci} renders as single expression |
✓ |
| 2 | Subscript mismatch \beta_j \in [\ci] |
✓ |
| 3 | \dsim macro |
✓ |
| 4–5 | T-conditioning in zero-inflation exercises | ✓ |
| 6–7 | {{< slidebreak >}} in both files |
✓ |
| 8 | sqrtninj duplicate label |
✓ |
| 9 | \hat{\v{\mu}} hat-atop-vector |
✓ |
| 10 | LTE inner expectations carry (X=x,T=t) |
✓ |
| 11 | β₀ notation collision Wald/LRT | ✓ |
| 12 | μ vs μ₀ inconsistency | ✓ |
| 13 | fig-needles-raw div format |
✓ |
| 14 | offset inside formula | ✓ |
| 15 | Overdispersion conclusion conditioning | ✓ |
| 16 | Vector estimator convention in CLAUDE.md | ✓ |
| 17–18 | Slide breaks and tbl-needles-count div format |
✓ |
| 19 | Unreferenced tables | ✓ |
| 20 | shsyr label clarified |
✓ |
| 21 | fig-needles rate vs count narrative inconsistency |
✓ |
| 22 | Uncited Introduction claims | ✓ |
| 23 | tbl-compare-poisson-nb tibble mismatch |
✓ |
| 24 | Introduction omits hivstat from covariate list |
✓ |
| 25 | Stale ::: notes text contradicts expanded DAG |
✓ |
| 26 | Sex IRR paragraph names adjustment set without hivstat |
✓ |
| 27 | hivstat in ZINB ziformula — convergence risk |
✓ |
Everything else looks good
97a3a72:probability.qmd#def-offsetis valid (poisson.qmd included at line 318); duplicate local anchor correctly removed;{C}brace-protection in BibTeX is correct. ✓e433d8eDAG expansion:timevarying(6 elements) matcheshex_x/hex_y(both 6 elements);setNames()call is safe;<->syntax is supported by dagitty. ✓glm1.nbformula (shared_syr ~ age + sex + homeless + polydrug + hivstat) matchesglm1;tbl-compare-poisson-nbtibble lengths are equal. ✓- ZIP
zinf_fit1and ZINBzinf_fit_nbare correctly named and independently scoped; no cross-contamination. ✓ - Overdispersion,
\signt/\Exppsubstitutions, first-person rewrite, NB extraction all correct throughout. ✓
|
Heads-up on CI: |
|
@claude review |
|
👀 Picked up by workflow run #26915836817. R/Quarto/renv setup runs first (~3-5 min); Claude itself responds after that. |
…structure Recast the needle-sharing DAG around homelessness as the exposure of interest (following Vittinghoff §8.3.1) and the cross-sectional study design: - Mark `homeless` as exposure and `shared_syr` as outcome. - Cross-sectional => the current covariates can't be ordered causally; each is drawn as a proxy for an unobserved *past state* (`U_x`, hollow node) that causes both the current measurement and the 30-day outcome. These latent past states are the confounders. - `nivdu` (injection frequency) is the only genuine mediator: a shared syringe is a subset of injections. - Add a conditioned-on study-inclusion node (`selected`); several covariates affect being sampled, so conditioning on it induces selection bias. - All directed edges (no bidirected), which also removes the ggdag double-arrow rendering ambiguity from the prior version. Update the caption and notes to explain the cross-sectional reading, the single mediator, the latent-past confounders, and selection bias. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
|
Claude finished @d-morrison's task in 5m 41s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
All 27 prior findings remain resolved. The latest commit ( Still open (eighth consecutive round): ZINB
|
| # | Issue | Status |
|---|---|---|
| 1 | \exp{\ci} two explicit endpoints |
✓ |
| 2 | Subscript mismatch \beta_j \in [\ci] |
✓ |
| 3 | \dsim macro |
✓ |
| 4–5 | T-conditioning in zero-inflation exercises | ✓ |
| 6–7 | {{< slidebreak >}} in both files |
✓ |
| 8 | sqrtninj duplicate label |
✓ |
| 9 | \hat{\v{\mu}} hat-atop-vector notation |
✓ |
| 10 | LTE inner expectations carry (X=x,T=t) |
✓ |
| 11 | β₀ notation collision Wald/LRT | ✓ |
| 12 | μ vs μ₀ inconsistency | ✓ |
| 13 | fig-needles-raw div format |
✓ |
| 14 | offset(log(nivdu)) inside formula |
✓ |
| 15 | Overdispersion conclusion conditioning | ✓ |
| 16 | Vector estimator convention in CLAUDE.md | ✓ |
| 17–18 | Slide breaks and tbl-needles-count div format |
✓ |
| 19 | Unreferenced tables | ✓ |
| 20 | shsyr label clarified |
✓ |
| 21 | fig-needles rate vs count narrative inconsistency |
✓ |
| 22 | Uncited Introduction claims | ✓ |
| 23 | tbl-compare-poisson-nb tibble mismatch |
✓ |
| 24 | Introduction omits hivstat from covariate list |
✓ |
| 25 | Stale ::: notes text contradicts expanded DAG |
✓ |
| 26 | Sex IRR paragraph omits hivstat from adjustment list |
✓ |
| 27 | hivstat in ZINB ziformula convergence risk |
✓ |
Following Vittinghoff §8.3.1 (which models shared_syr on homelessness) and the cross-sectional DAG, recenter the analysis on homelessness as the exposure of interest: - Fit shared_syr ~ homeless + age + sex + ethn + polydrug + hivstat + sexabuse + hplsns: homelessness as exposure, the other measured covariates as proxies for the latent past confounders (all except the mediator nivdu). Add sexabuse to the data prep. Exclude dprsn_dx, whose 1/5 coding is undocumented in the source data (the book uses only homeless). - Headline the homelessness IRR in the interpretation (adjusted ~6, vs Vittinghoff's unadjusted ~3.3); add ethn/sexabuse/hplsns coefficients. - Recast the fitted-model figure as the homeless-vs-not contrast over age (other covariates at reference), on the pseudo-log scale. - NB/ZIP/ZINB use the same count model; both zero-inflation submodels use ziformula = ~ homeless, matching Vittinghoff's inflate(i.homeless). This also resolves the review finding that the old ZINB note (ziformula = ~ 1) contradicted the ZIP submodel: both are now consistent and book-faithful. - Clean up pre-existing = / glm1.nb lint in the rewritten extensions chunks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
|
Claude finished @d-morrison's task in 5m 53s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
Commit 1. Introduction sentence omits
|
| # | Issue | Status |
|---|---|---|
| 1 | \exp{\ci} renders as single expression, not two endpoints |
✓ |
| 2 | Subscript mismatch \beta_j \in [\ci] |
✓ |
| 3 | \dsim macro instead of raw \dot\sim |
✓ |
| 4–5 | T-conditioning in zero-inflation exercises | ✓ |
| 6–7 | {{< slidebreak >}} in both files |
✓ |
| 8 | sqrtninj duplicate label clarified |
✓ |
| 9 | \hat{\v{\mu}} hat-atop-vector notation |
✓ |
| 10 | LTE inner expectations carry (X=x,T=t) |
✓ |
| 11 | β₀ notation collision Wald/LRT | ✓ |
| 12 | μ vs μ₀ inconsistency | ✓ |
| 13 | fig-needles-raw div format |
✓ |
| 14 | offset(log(nivdu)) inside formula |
✓ |
| 15 | Overdispersion conclusion conditioning | ✓ |
| 16 | Vector estimator convention in CLAUDE.md | ✓ |
| 17–18 | Slide breaks and tbl-needles-count div format |
✓ |
| 19 | Unreferenced tables | ✓ |
| 20 | shsyr label clarified |
✓ |
| 21 | fig-needles rate vs count narrative inconsistency |
✓ |
| 22 | Uncited Introduction claims | ✓ |
| 23 | tbl-compare-poisson-nb tibble mismatch |
✓ |
| 24 | Introduction omits hivstat from covariate list |
✓ |
| 25 | Stale ::: notes text contradicts expanded DAG |
✓ |
| 26 | Sex IRR paragraph omits hivstat from adjustment list |
✓ |
| 27 | hivstat in ZINB ziformula — convergence risk |
✓ |
Everything else in 7d2fcde looks good
sexabusecorrectly codedcase_match(1 ~ "yes", 0 ~ "no") |> factor() |> relevel(ref = "no"). ✓ref_lvl()helper correctly picks the first (reference) level fromglm1$model. ✓pred_gridcorrectly holds sex, ethn, polydrug, hivstat, sexabuse at reference, hplsns at its median, and expands over both levels of homeless. ✓predict(glm1, newdata = pred_grid, type = "response")with no offset correctly returns expected counts directly. ✓glm1_nbformula (extensions, line 4) matchesglm1exactly;tbl-compare-poisson-nblengths are equal. ✓- Both
zinf_poisandzinf_nbuseziformula = ~ homeless, consistent with each other and with Vittinghoff. ✓ - DAG
::: notesblock (lines 219–246) correctly explains homelessness as focal exposure,nivduas mediator, and cross-sectional reading of remaining covariates. ✓ - Model prose at lines 407–413 correctly lists all eight covariates and explains the exclusion of
dprsn_dx. ✓ - Homelessness IRR interpretation correctly notes the unadjusted vs adjusted comparison to Vittinghoff's ≈ 3.3. ✓
renv.lockchanges (bit64 4.8.0→4.8.2, clipr 0.8.0→0.8.1, cpp11 0.5.4→0.5.5) are minor version bumps; no incompatibilities expected. ✓- All
{{< slidebreak >}}shortcodes in new content correct; remaining---markers in both files are pre-existing. ✓
|
@claude review again — substantial reframe pushed:
|
|
👀 Picked up by workflow run #26972168642. R/Quarto/renv setup runs first (~3-5 min); Claude itself responds after that. |
|
@claude review |
|
👀 Picked up by workflow run #27039677878. R/Quarto/renv setup runs first (~3-5 min); Claude itself responds after that. |
|
Claude finished @d-morrison's task in 3m 51s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
All 27 findings from prior review rounds remain resolved. Two findings from the last round (2026-06-04T18:42:09Z) remain open — both introduced by commit 1. Introduction sentence still omits
|
|
Claude finished @d-morrison's task in 4m 33s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
The latest commit ( Two issues flagged in the prior round (2026-06-05T21:43:28Z) remain open and unaddressed. 1. Introduction sentence omits three model predictors —
|
| # | Issue | Status |
|---|---|---|
| 1 | \exp{\ci} renders as single expression, not two endpoints |
✓ |
| 2 | Subscript mismatch \beta_j \in [\ci] |
✓ |
| 3 | \dsim macro instead of raw \dot\sim |
✓ |
| 4–5 | T-conditioning in zero-inflation exercises | ✓ |
| 6–7 | {{< slidebreak >}} in both files |
✓ |
| 8 | sqrtninj duplicate label clarified |
✓ |
| 9 | \hat{\v{\mu}} hat-atop-vector notation |
✓ |
| 10 | LTE inner expectations carry (X=x,T=t) |
✓ |
| 11 | β₀ notation collision Wald/LRT | ✓ |
| 12 | μ vs μ₀ inconsistency | ✓ |
| 13 | fig-needles-raw div format |
✓ |
| 14 | offset(log(nivdu)) inside formula |
✓ |
| 15 | Overdispersion conclusion conditioning | ✓ |
| 16 | Vector estimator convention in CLAUDE.md | ✓ |
| 17–18 | Slide breaks and tbl-needles-count div format |
✓ |
| 19 | Unreferenced tables | ✓ |
| 20 | shsyr label clarified |
✓ |
| 21 | fig-needles rate vs count narrative inconsistency |
✓ |
| 22 | Uncited Introduction claims | ✓ |
| 23 | tbl-compare-poisson-nb tibble mismatch |
✓ |
| 24 | Introduction omits hivstat from covariate list |
✓ |
| 25 | Stale ::: notes text contradicts expanded DAG |
✓ |
| 26 | Sex IRR paragraph omits hivstat from adjustment list |
✓ |
| 27 | hivstat in ZINB ziformula — convergence risk |
✓ |
Everything else looks good
- Overdispersion typo fixes ("if if", "value is predicted") correct.
-
\signtand\Exppmacro substitutions correct. - First-person rewrite in Quasipoisson section reads well; NB content moved out of
::: notescorrect. - Zero-inflation PMF and variance derivations (Law of Total Expectation/Variance, with explicit
(X=x,T=t)conditioning restored in the final lines) mathematically correct throughout. - LRT formula:
$G^2 = 2[\hat\ell_1 - \hat\ell_0]$ with\dsim, M₁/M₀ naming, parenthetical disambiguating model-index from scalar Wald null β₀. Correct. - Rate-ratio CI uses two explicit endpoints; log-scale CI uses
±shorthand — both internally consistent. -
glm1(shared_syr ~ homeless + age + sex + ethn + polydrug + hivstat + sexabuse + hplsns, no offset) consistent with Vittinghoff §8.3.1 and the mediator argument. ✓ -
glm1_nbformula matchesglm1;tbl-compare-poisson-nbtibble lengths equal. ✓ - Both ZIP and ZINB use
ziformula = ~ homeless, matching Vittinghoff'sinflate(i.homeless); ZINB::: notescorrectly makes the separation issue (NB overdispersion competing with ZI component) the primary reason for the parsimonious submodel. ✓ - DAG (
fig-needles-dag) redesigned to cross-sectional latent-cause structure; all-directed edges resolve the prior bidirected-arrow rendering concern. ✓ -
@tsui2009hcvcorrectly hedged as "possible (though unconfirmed)"; BibTeX entry complete. ✓ -
\hat{\v{\mu}}(hat atop tilde-vector symbol) correct per documented convention in CLAUDE.md. ✓ - All
{{< slidebreak >}}shortcodes in new content correct; all remaining---markers in both files are pre-existing. - Merge commit
03effe5(main → branch) introduces no new issues in the book content.
|
Claude finished @d-morrison's task in 4m 34s —— View job Code Review: PR #860 — Revise count regression chapter Tasks
All 27 prior findings remain resolved. The most recent merge of 1. Introduction sentence still omits three model predictors —
|
| # | Issue | Status |
|---|---|---|
| 1 | \exp{\ci} renders as single expression, not two endpoints |
✓ |
| 2 | Subscript mismatch \beta_j \in [\ci] |
✓ |
| 3 | \dsim macro instead of raw \dot\sim |
✓ |
| 4–5 | T-conditioning in zero-inflation exercises | ✓ |
| 6–7 | {{< slidebreak >}} in both files |
✓ |
| 8 | sqrtninj duplicate label clarified |
✓ |
| 9 | \hat{\v{\mu}} hat-atop-vector notation |
✓ |
| 10 | LTE inner expectations carry (X=x,T=t) |
✓ |
| 11 | β₀ notation collision Wald/LRT | ✓ |
| 12 | μ vs μ₀ inconsistency | ✓ |
| 13 | fig-needles-raw div format |
✓ |
| 14 | offset(log(nivdu)) inside formula |
✓ |
| 15 | Overdispersion conclusion conditioning | ✓ |
| 16 | Vector estimator convention in CLAUDE.md | ✓ |
| 17–18 | Slide breaks and tbl-needles-count div format |
✓ |
| 19 | Unreferenced tables | ✓ |
| 20 | shsyr label clarified |
✓ |
| 21 | fig-needles rate vs count narrative inconsistency |
✓ |
| 22 | Uncited Introduction claims | ✓ |
| 23 | tbl-compare-poisson-nb tibble mismatch |
✓ |
| 24 | Introduction omits hivstat |
✓ |
| 25 | Stale ::: notes text contradicts DAG |
✓ |
| 26 | Sex IRR paragraph omits hivstat |
✓ |
| 27 | ZINB ziformula convergence risk |
✓ |
Latest merges: no new issues
- Merge
141e3ab(main → branch): changed only.github/workflows/,CLAUDE.md, skill files, andrenv.lock. No book-content changes in the PR's changed files. ✓ - ZINB
::: notesblock — now correctly uses both ZIP and ZINB withziformula = ~ homelessand explains the ZINB-specific separation issue as the primary reason for parsimony. Prior ZIP/DAG inconsistency fully resolved. ✓ \hat{\v{\mu}}(hat atop tilde-vector) is correct per the newly documented CLAUDE.md convention. ✓probability.qmd#def-offsetcross-link (replacing orphanpoisson.qmd#def-offset) is valid. ✓- DAG redesigned to cross-sectional latent-cause structure with all-directed edges; bidirected rendering concern is moot. ✓
Built to branch |

Addresses issue #747.
Generated with Claude Code