Skip to content

revise count regression chapter#860

Open
d-morrison wants to merge 41 commits into
mainfrom
claude/issue-747-20260516-0652
Open

revise count regression chapter#860
d-morrison wants to merge 41 commits into
mainfrom
claude/issue-747-20260516-0652

Conversation

@d-morrison

Copy link
Copy Markdown
Owner

Addresses issue #747.

  • Fix typos in overdispersion definition ("if if", "value is predicted")
  • Replace \mathbb{E} with \Expp macro in rate-ratio derivation
  • Replace \text{sign} with \signt macro in deviance residuals
  • Rewrite first-person language in Quasipoisson section
  • Move NB models content out of notes-only block
  • Add worked solutions for zero-inflation PMF and E/Var exercises
  • Flesh out inference section with explicit CI, hypothesis test, and LRT formulas

Generated with Claude Code

- Fix duplicate "if" and "value is predicted" typos in overdispersion definition
- Replace \mathbb{E} with \Expp macro in rate-ratio derivation for consistency
- Replace \text{sign} with \signt macro in deviance residual formula and callout
- Rewrite first-person language in Quasipoisson section
- Move NB models explanatory content out of notes-only block so it renders in HTML/PDF
- Add worked solutions for zero-inflation PMF and E/Var exercises
- Flesh out inference section with explicit CI, hypothesis test, and LRT formulas

Co-authored-by: Douglas Ezra Morrison <d-morrison@users.noreply.github.com>
@d-morrison

Copy link
Copy Markdown
Owner Author

Claude Code CLI (local session) is resolving the merge conflicts with main here. Paws off until done.

…0516-0652

# Conflicts:
#	_subfiles/count-regression/_sec_poisson_RRs.qmd

Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
@d-morrison

Copy link
Copy Markdown
Owner Author

Merge conflicts resolved by merging main into the branch (commit 7181ab9). The only conflict was one expectation expression in _sec_poisson_RRs.qmd — kept main's \E{Y |\red{X_j = a}, …} form (preferred \E{}/\red{} macros, and consistent with the unconflicted denominator just below). Everything else (including the submodule pointer → af44e55) auto-merged cleanly. Paws off released.

@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 3m 49s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read changed files
  • Review each file for correctness, notation, conventions
  • Post findings

Overall this is a clean, well-targeted PR. The typo fixes, macro substitutions, language rewrites, and worked solutions are all correct. A few small issues below.


_sec_zero-inflation.qmd

1. New --- slide break (line 69) violates repo convention.
The PR adds a --- between the PMF solution block and the exr-zinf-moments exercise. Per CLAUDE.md, slide breaks in .qmd files must use {{< slidebreak >}}, not ---. The pre-existing --- at line 31 is outside this PR's scope, but the new one at line 69 should be changed.

Fix this →

2. Inconsistent T-conditioning in exr-zinf-moments exercise text (line 73).
The exercise statement defines $\pi = P(Z=1|X=x,T=t)$, but the top of the section establishes that the model for $Z$ is $P(Z=1|X=x) = \pi(x)$ — no $T$ dependence. The solution on line 78 correctly writes $\pi = \P(Z=1|X=x)$ (without $T$). The exercise text should be updated to match.

Fix this →


_sec_poisson_inference.qmd

3. Subscript mismatch in CI formula (lines 5–18).
The formula \beta_j \in \left[\ci \right] is inconsistent: \ci expands to \hat\beta ± z_{1-α/2} · \widehat{se}(\hat\beta) (where \hb = \hat\beta has no subscript). So the rendered formula reads $\beta_j \in [\hat\beta \pm \cdots]$, with $j$ on the left but not the right. The previous version used \beta (no subscript) on the LHS, which was self-consistent with the \ci macro. Either:

  • revert the LHS to \beta and add a sentence like "where $\beta_j$ is any individual coefficient", or
  • expand \ci explicitly as \hat\beta_j \pm \ciradf{\hat\beta_j} for the subscripted case.

The same mismatch appears in the rate-ratio CI (e^{\beta_j} \in [\exp{\ci}]).

4. Minor: \dot\sim spacing (line 44).
The new code writes $G^2 \dot\sim \chi^2_{p_1 - p_0}$ but the rest of the project (e.g., _sec_logistic_gof.qmd, _sec_compare_lm_glm_deviance.qmd) uses \dot \sim with a space. Trivial but worth standardizing.


Everything else looks good

  • Overdispersion typo fixes ("if if", "value is predicted") are correct.
  • \signt macro substitution is correct; the macro is defined at latex-macros/macros.qmd:525.
  • \Expp substitutions are correct; macro at latex-macros/macros.qmd:223.
  • First-person language removal in Quasipoisson section (chapters/count-regression.qmd:92–102) reads well.
  • NB models content moved out of ::: notes correctly — now visible in all output formats.
  • Zero-inflation worked solutions (PMF and E/Var derivations) are mathematically correct throughout.
  • LRT section is well-written with appropriate citations.

fig-needles plotted y = shsyryn (binary yes/no indicator) but its caption
reads "Rates of needle sharing" and the Poisson model below it fits
shared_syr ~ ... + offset(log(nivdu)) — i.e., the rate of shared
syringes per injection. Switch to y = shared_syr / nivdu (rate) with
nivdu > 0 filter to avoid divide-by-zero, and label the y-axis.

fig-needles-pois used #| tbl-cap with a fig- label; the chunk produces a
plot, so swap to fig-cap to keep the cross-reference type consistent.

https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 7m 33s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather diff and context
  • Run three-angle candidate search (line-by-line, removed-behavior, cross-file)
  • Verify top candidates
  • Post findings

Overall the PR is well-targeted: typo fixes, macro substitutions, language rewrites, and worked solutions are all correct. Four issues found.


1. \exp{\ci} renders as a single expression, not two endpoints

File: _subfiles/count-regression/_sec_poisson_inference.qmd, line 17

The formula e^{\beta_j} \in \left[\exp{\ci} \right] is broken. \exp is redefined as \operatorname{exp}\cb{#1} (curly-brace wrapper), and \ci expands to \hb {\color{red}\pm} \cirad. So \exp{\ci} renders as:

exp{ β̂  ±  z_{1-α/2} · se(β̂) }

— a single expression with ± inside the braces, not two separate endpoints. The prose immediately above (line 14) explicitly says "by exponentiating both endpoints", which directly contradicts the formula. A correct rendering requires writing out the two endpoints explicitly, e.g.:

e^{\beta_j} \in \left[\exp{\hb - \cirad},\; \exp{\hb + \cirad}\right]

Fix this →


2. Subscript mismatch introduced by \beta_j change

File: _subfiles/count-regression/_sec_poisson_inference.qmd, lines 7 and 17

The PR changes the LHS of the CI formula from \beta to \beta_j, but \ci still expands to \hb ± \cirad = \hat\beta ± z·se(\hat\beta) — unsubscripted on both the estimator and the SE argument. The rendered result is:

β_j  ∈  [ β̂ ± z · se(β̂) ]

The subscript j appears on the left but nowhere inside the brackets, which is internally inconsistent. The old form \beta \in [\ci] was self-consistent (no subscripts anywhere). Either revert the LHS to \beta and add a sentence "where j indexes any individual coefficient", or expand \ci explicitly with subscripted arguments: \hat\beta_j \pm \ciradf{\hat\beta_j}.

Fix this →


3. \pi T-conditioning: exercise defines it differently from model and solution

File: _subfiles/count-regression/_sec_zero-inflation.qmd, lines 73 and 78 (also 34 and 39)

The model definition (line 6) specifies P(Z=1|X=x) = π(x) — no T-dependence. The exercise prompts (lines 34 and 73, the latter changed by this PR) define π = P(Z=1|X=x,T=t), conditioning on both X and T. But the solutions (lines 39 and 78, both added by this PR) re-define π = P(Z=1|X=x) — dropping T. A student reading the exercise wording would be justified in carrying T through their derivation, and would then get an answer that differs from the solution.

The fix is to make the exercise prompts match the model definition and solutions: drop T=t from the definition of π in both exercise texts (lines 34 and 73). The quantity that depends on T is μ = E[Y|Z=0,X=x,T=t], not π.

Fix this →


4. Data dictionary labels shsyryn but response variable is now shared_syr

File: chapters/exr-needle-sharing.qmd, line 36

set_variable_labels() on line 36 still labels shsyryn as "shared syringe yes/no". The figure (line 80) and model formula (line 133) now use shared_syr (the count variable) as the outcome, but shared_syr has no label in the data dictionary call. The rendered dictionary table will describe shsyryn (a binary indicator, no longer the response) but will leave shared_syr (the actual response) unlabeled, giving students a misleading picture of the data.

Fix this →


5. \dot\sim should use the \dsim macro

File: _subfiles/count-regression/_sec_poisson_inference.qmd, line 44

The new LRT line writes $G^2 \dot\sim \chi^2_{p_1 - p_0}$. The project defines \dsim (and \ddist) as \ \dot{\sim} \ with surrounding thin spaces; all other files in the codebase use \dsim or \ddist (or \dot \sim with a space) rather than the spaceless \dot\sim. The raw form produces slightly tighter horizontal spacing than the rest of the document.

Fix this →


Everything else looks good

  • Overdispersion typo fixes ("if if", "value is predicted") are correct.
  • \signt and \Expp macro substitutions are correct.
  • First-person language removal in Quasipoisson section reads well.
  • NB models content moved out of ::: notes is correct — now visible in all output formats.
  • The --- between exercise blocks (lines 31 and 69) was pre-existing, not added by this PR.
  • Zero-inflation PMF derivation is mathematically correct.
  • Variance derivation (Law of Total Variance) algebra is correct throughout.
  • LRT section is well-written with appropriate citations.
  • tbl-capfig-cap fix on fig-needles-pois is correct.
  • shared_syr column confirmed present in the dataset; dplyr::filter(nivdu > 0) before dividing is the correct guard.

Data dictionary improvements:
- Add labels for the nine variables that previously had no label (id,
  age, dprsn_dx, sexabuse, shared_syr, hplsns, nivdu, sqrtnivd, shsyr),
  pulling descriptions from the rmb package's documentation
  (https://d-morrison.github.io/rmb/man/needle_sharing.html).
- Fix typo on sqrtninj: "sqrt(No. of infections in 30 days)" → "sqrt(No.
  of injections in 30 days)" (verified against .dta values; sqrtninj is
  the square root of nivdu, the injection count).
- Tighten and standardise wording on the other labels.
- Add a short prose blurb naming the dataset source (RMB2e chapter 8,
  UCSF companion website) and sample size (n = 128, 17 variables).

New demographics table (tbl-needles-demographics):
- Reports N, age (median, Q1–Q3), sex, ethnicity, HIV status,
  homelessness, polydrug use, and the two count variables (injections
  and shared syringes per 30 days), all on the analysis sample
  (Trans-sex singleton excluded).
- Verified locally: n = 127 after Trans exclusion, age median 41
  (IQR 35-48), 24% female, 59% White, 7% HIV+, 50% homeless.

https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 4m 57s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context
  • Read changed files and diff
  • Cross-check prior review findings against current state
  • Check new commit(s) for new issues
  • Post findings

The two latest commits (3323cd5 — fix wrong variables in exploratory/prediction figures; ceaac41 — add demographics table and enrich data dictionary) are well-targeted fixes. The shsyryn → shared_syr/nivdu rate correction and the tbl-cap → fig-cap fix are both correct. The demographics table code is correctly structured (consistent Characteristic/Value vector lengths, proper use of a separate demog_data object before the Trans removal step). Five issues from prior review rounds remain open; two new issues were introduced by the latest commits.


Still open: _subfiles/count-regression/_sec_poisson_inference.qmd

1. \exp{\ci} renders as a single expression, not two endpoints (line 17)

\exp is redefined in latex-macros/macros.qmd:291 as \operatorname{exp}\cb{#1}, which wraps its argument in curly braces. \ci expands to \hb {\color{red}\pm} \cirad, so \exp{\ci} renders as:

exp{ β̂  ±  z · se(β̂) }

— a single expression with ± inside the braces, not two separate endpoints. The prose at line 14 explicitly says "by exponentiating both endpoints", which directly contradicts the rendered formula. Two explicit endpoints are needed:

e^{\beta_j} \in \left[\exp{\hb - \cirad},\; \exp{\hb + \cirad}\right]

Fix this →

2. Subscript mismatch: \beta_j \in [\ci] (line 7)

\ci is defined as \hb {\color{red}\pm} \cirad = \hat\beta ± z·se(\hat\beta) — no j subscript. The LHS now reads β_j but the brackets show β̂ ± … without j, which is internally inconsistent. The pre-PR form \beta \in [\ci] was self-consistent. Either revert the LHS to \beta and add a sentence like "where j indexes any individual coefficient", or expand \ci explicitly as \hat\beta_j \pm \ciradf{\hat\beta_j}.

Fix this →

3. \dot\sim should use \dsim macro (line 44)

latex-macros/macros.qmd:116 defines \dsim as \ \dot{\sim} \ with surrounding thin spaces. The new line $G^2 \dot\sim \chi^2_{p_1 - p_0}$ uses the raw spaceless form.

Fix this →


Still open: _subfiles/count-regression/_sec_zero-inflation.qmd

4. π T-conditioning mismatch in exr-zinf-moments exercise text (line 73)

The model definition (line 6) specifies P(Z=1|X=x) = π(x) — no T-dependence. The solution (line 78) correctly writes π = P(Z=1|X=x). But the exercise text on line 73 defines π = P(Z=1|X=x,T=t). A student who treats T as conditioning in π would carry it through the derivation and arrive at a different expression than the solution. Fix: drop T=t from the definition of π in the exercise text.

Fix this →

5. New --- slide break at line 69 (added by this PR)

Per CLAUDE.md, slide breaks in .qmd files must use {{< slidebreak >}}, not ---. The --- between the .solution div closing and #exr-zinf-moments was added by this PR.

Fix this →


New issues in latest commits: chapters/exr-needle-sharing.qmd

6. New --- slide break at line 138 (added by ceaac41)

The demographics table block added by ceaac41 ends with:

:::

---

```{r}
#| tbl-cap: "Needle-sharing data"

The --- at line 138 is new content from this PR. Per CLAUDE.md it should be {{< slidebreak >}}.

Fix this →

7. sqrtnivd and sqrtninj have identical descriptions (lines 45 and 48)

The enriched set_variable_labels() call labels both sqrtnivd and sqrtninj as "sqrt(No. of injections in 30 days)". Their different names suggest they are distinct columns (possibly differing in the denominator or derivation). The data dictionary table will show two rows with the same description, giving students no way to distinguish them.

Fix this →


Everything else looks good

  • Overdispersion typo fixes ("if if", "value is predicted") correct.
  • \signt and \Expp macro substitutions correct.
  • First-person rewrite in Quasipoisson section reads well.
  • NB content moved out of ::: notes is correct.
  • Zero-inflation PMF and variance derivations are mathematically correct throughout (Law of Total Variance algebra verified).
  • shared_syr / nivdu rate figure with dplyr::filter(nivdu > 0) guard is the correct fix for the exploratory plot.
  • tbl-cap → fig-cap fix on fig-needles-pois is correct.
  • Demographics table code structure (Characteristic/Value vector lengths consistent, demog_data correctly scoped before the Trans removal step).
  • shared_syr label ("Number of shared syringes (in 30 days)") now present — addresses the prior review finding.

…ebreaks, dup label

_sec_poisson_inference.qmd:
- Make the coefficient-CI formula's subscripts internally consistent: expand
  \ci to \hat\beta_j \pm \ciradf{\hat\beta_j} (matching the \beta_j on the
  LHS) instead of using the unsubscripted \ci macro.
- Rewrite the rate-ratio CI as two explicit endpoints
  [\exp{\hat\beta_j - rad}, \exp{\hat\beta_j + rad}] — the prose says
  "exponentiating both endpoints" but \exp{\ci} expands to a single
  exp{...} expression with the ± inside the braces, not two endpoints.
- Switch \dot\sim to the project's \dsim macro on the LRT line so the
  spacing matches the rest of the codebase.

_sec_zero-inflation.qmd:
- exr-zinf-moments exercise text defined π conditional on T, but the
  model definition and the solution both define π conditional only on X.
  Drop T=t from the exercise prompt so a student's derivation cannot
  legitimately diverge from the solution.
- Replace the new --- slide separator above the exercise with
  {{< slidebreak >}} per CLAUDE.md.

exr-needle-sharing.qmd:
- Replace the --- I added between the new demographics table and the raw
  data table with {{< slidebreak >}}.
- Clarify the sqrtninj label: sqrtninj and sqrtnivd are numerically
  identical columns in the .dta (verified: zero rows differ across all
  128), so note the duplication instead of giving the two columns
  identical descriptions.

https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Claude encountered an error after 3m 45s —— View job


I'll analyze this and get back to you.

@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 3m 7s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read diff and changed files
  • Cross-check prior review findings against current state (commit 6316dcc)
  • Check for any new issues
  • Post findings

Commit 6316dcc ("Address #860 review findings") resolves six of the seven previously-reported issues. One T-conditioning mismatch remains. Everything else is now correct.


Resolved

  1. \exp{\ci} endpoint rendering — Fixed. The rate-ratio CI now correctly writes two explicit endpoints using \exp{\hat\beta_j - \ciradf{\hat\beta_j}} and \exp{\hat\beta_j + \ciradf{\hat\beta_j}} (lines 19–20). \exp{X} = \operatorname{exp}\cb{X} wraps each argument in properly-sized braces, and \ciradf{\hat\beta_j} = z_{1-\alpha/2} \cdot \widehat{se}(\hat\beta_j). Two distinct endpoints as intended. ✓

  2. Subscript mismatch in CI formula — Fixed. The LHS is now \beta_j and the RHS uses \hat\beta_j \pm \ciradf{\hat\beta_j}, so the subscript j is present throughout (_sec_poisson_inference.qmd, line 7). ✓

  3. \dsim macro — Fixed. Line 48 now reads $G^2 \dsim \chi^2_{p_1 - p_0}$, matching the rest of the project. ✓

  4. exr-zinf-moments exercise T-conditioning — Fixed. Line 73 now reads $\pi = P(Z=1|X=x)$, consistent with the model definition (line 6) and solution (line 78). ✓

  5. {{< slidebreak >}} in _sec_zero-inflation.qmd — Fixed. Line 69 uses the shortcode, not ---. ✓

  6. {{< slidebreak >}} in exr-needle-sharing.qmd — Fixed. Line 138 uses {{< slidebreak >}}. ✓

  7. sqrtnivd/sqrtninj duplicate labels — Fixed. sqrtninj now carries the label "sqrt(No. of injections in 30 days); duplicate of sqrtnivd", making the two distinct in the data dictionary. ✓


One remaining issue

exr-zinf-pmf exercise text still uses P(Z=1|X=x,T=t) (line 34)

The model definition (line 6) specifies P(Z=1|X=x) = π(x) — no T-dependence. The solution added by this PR (line 39) correctly defines π = P(Z=1|X=x) without T. But the exercise prompt at line 34 still asks students to express things "in terms of P(Z=1|X=x,T=t) and P(Y=y|Z=0,X=x,T=t)", conditioning on T for the Z probability.

A student following the exercise text literally would carry T through their derivation for the π component and arrive at an expression different from the solution. The prior review at 08:24:52Z explicitly flagged both lines 34 and 73 for this fix; line 73 was addressed but line 34 was not.

Fix: On line 34, change $P(Z=1|X=x,T=t)$ to $P(Z=1|X=x)$ in the list of building-block quantities. The full sentence would then read: "…into expressions involving $P(Z=1|X=x)$ and $P(Y=y|Z=0,X=x,T=t)$."

Fix this →


Everything else looks good

  • Overdispersion typo fixes ("if if", "value is predicted") correct.
  • \signt and \Expp macro substitutions correct.
  • First-person rewrite in Quasipoisson section reads well.
  • NB content moved out of ::: notes is correct.
  • Zero-inflation PMF and variance derivations mathematically correct throughout.
  • shared_syr / nivdu rate figure with dplyr::filter(nivdu > 0) guard is correct.
  • Demographics table code correctly structured; demog_data scoped before Trans removal step.
  • Pre-existing --- separators at lines 75 and 146 of exr-needle-sharing.qmd are outside this PR's scope.

The needle-sharing data has five ethnicity categories with n = 1 or 2
participants (Asian, Filipino, Indian, "Indian & White", "White & Hispa"),
which makes the data dictionary and demographics tables noisy and would
make any ethn-stratified model have empty/near-empty cells. Collapse
those into "Other" via forcats::fct_lump_min(min = 5).

Verified: post-lump ethn has 4 levels — White (76), AA (36),
Hispanic (10), Other (6). Lump correctly bundles the five rare
categories.

https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 2m 28s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read changed files and diff
  • Cross-check prior review findings against current state (latest commit 72fdee5)
  • Check for any new issues in latest commit
  • Post findings

Commit 72fdee5 ("Lump rare ethnicities via forcats::fct_lump_min") adds ethnicity lumping and is the only new commit since the last review. It introduces no new correctness issues. Six of the seven previously-reported issues remain resolved. One issue is still open.


Still open: _subfiles/count-regression/_sec_zero-inflation.qmd

exr-zinf-pmf exercise text still uses P(Z=1|X=x,T=t) (line 34)

The model definition (line 6) specifies P(Z=1|X=x) = π(x) — no T-dependence. The solution (line 39) correctly defines π = P(Z=1|X=x). But the exercise text on line 34 still asks students to expand in terms of $P(Z=1|X=x,T=t)$ (conditioning on T). A student following the exercise literally would carry T through their derivation and arrive at an expression inconsistent with the solution.

The same fix was applied to the analogous location in exr-zinf-moments (line 73) in commit 6316dcc; this location was missed.

Fix: On line 34, change $P(Z=1|X=x,T=t)$ to $P(Z=1|X=x)$.

Fix this →


Latest commit (72fdee5) — fct_lump_min addition

The new forcats::fct_lump_min(min = 5, other_level = "Other") call is correctly implemented:

  • fct_lump_min(min = 5) lumps levels with n < 5; the label "n<5 lumped to Other" (line 40) is accurate.
  • The namespace-qualified call avoids requiring an explicit library(forcats).
  • relevel(ref = "White") is safe after lumping, as White is a large group.
  • The demographics table correctly iterates over levels(demog_data$ethn), which will include "Other" if any ethnic group was lumped.

No issues in this commit.


All previously-reported issues resolved in commit 6316dcc

  1. \exp{\ci} endpoint rendering — Fixed. Two explicit endpoints using \exp{\hat\beta_j - \ciradf{\hat\beta_j}} and \exp{\hat\beta_j + \ciradf{\hat\beta_j}}. ✓
  2. Subscript mismatch in CI formula — Fixed. LHS \beta_j matches RHS \hat\beta_j ± \ciradf{\hat\beta_j}. ✓
  3. \dsim macro — Fixed. Line 48 now reads $G^2 \dsim \chi^2_{p_1 - p_0}$. ✓
  4. exr-zinf-moments T-conditioning (line 73) — Fixed. ✓
  5. {{< slidebreak >}} in _sec_zero-inflation.qmd (line 69) — Fixed. ✓
  6. {{< slidebreak >}} in exr-needle-sharing.qmd (line 138) — Fixed. ✓

Everything else looks good

  • Overdispersion typo fixes ("if if", "value is predicted") correct.
  • \signt and \Expp macro substitutions correct.
  • First-person rewrite in Quasipoisson section reads well.
  • NB content moved out of ::: notes is correct.
  • Zero-inflation PMF and variance derivations mathematically correct throughout.
  • shared_syr / nivdu rate figure with dplyr::filter(nivdu > 0) guard is correct.
  • Demographics table code correctly structured.
  • sqrtninj duplicate-label note is present.

sjPlot::plot_model(type="pred") doesn't expose the offset, so its y-axis
was on the *count* scale and didn't match @fig-needles (which plots
shared_syr / nivdu). Replace with a direct prediction grid where
nivdu = 1 forces log(nivdu) = 0 in the offset; predict(type="response")
then returns the Poisson rate per injection. Overlay observed
shared_syr / nivdu points (sized by nivdu) and the fitted rate lines,
faceted by homeless × polydrug, coloured by sex — same panel structure
as @fig-needles.

Verified locally: pred_grid 400 rows, rates 4.5e-9 to 0.066; observed
data 121 complete-case rows after dropping NAs in homeless/polydrug/
sex/age (also removes the previous stray "homeless: NA" facet).

https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read changed files and latest commit diff
  • Cross-check prior review findings against current state
  • Check new commit d4b48d9 for new issues
  • Post findings

View job run

exr-prac-glm-score's solution invokes orthogonality of the residual
vector against each predictor column. def-orthogonal-vectors already
exists in the math-prereqs linear-algebra section and matches the usage
exactly, so cross-link to it and add the explicit inner-product equation
so a student following the link back can verify the claim.

Also shorten fig-needles-pois's fig-cap below the 80-char limit.

https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 4m 43s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read diff and latest commits (d4b48d9, 05d412b)
  • Cross-check prior review findings against current state
  • Check new commits for new issues
  • Post findings

Six of the seven previously-reported issues remain resolved from commit 6316dcc. The two newest commits (d4b48d9, 05d412b) introduce one minor notation issue. One issue from prior rounds is still open.


Still open: _subfiles/count-regression/_sec_zero-inflation.qmd

exr-zinf-pmf exercise text still uses P(Z=1|X=x,T=t) (line 34)

The model definition (line 6) specifies P(Z=1|X=x) = π(x) — no T-dependence. The solution (line 39) correctly defines π = P(Z=1|X=x). The exercise prompt at line 34 still asks students to expand in terms of $P(Z=1|X=x,T=t)$ (T-conditioned). A student following the exercise literally would carry T through the derivation for the π component and arrive at an expression that differs from the solution.

The same fix was applied to the corresponding location in exr-zinf-moments (line 73) in commit 6316dcc; this location was missed.

Fix: On line 34, change $P(Z=1|X=x,T=t)$ to $P(Z=1|X=x)$.

Fix this →


New issue in 05d412b: _subfiles/count-regression/_exr-prac-glm-score.qmd

Notation inconsistency in new vector equation (lines 74–75)

The new content writes:

for each predictor $j$, the residual vector $(y - \hat\mu)$ satisfies
$\tp{\vx_{(j)}}(y - \hat\mu) = 0$,

\vx_{(j)} is bold (via the \vx macro defined in latex-macros/macros.qmd:343), but y and \hat\mu are unbolded. The file already uses \vy (defined at macros.qmd:349) in the log-likelihood signature on line 11 ($\ell(\beta_0, \beta_1; \vy, \vx)$). The residual vector should be (\vy - \hat\mu) to match the bold vector convention used by \vx_{(j)}. (There is no \vmu macro, so \hat\mu without bold is the best available form for the fitted-value vector.)

Fix this →


d4b48d9 — fig-needles-pois: rate-scale prediction plot

No correctness issues. The approach is technically sound:

  • nivdu = 1 in pred_grid forces log(nivdu) = 0 in the offset, so predict(type="response") returns the Poisson rate per injection — matching the y-axis of @fig-needles. ✓
  • obs_data uses shared_syr / nivdu on the same rate scale. ✓
  • with(glm1$model, expand.grid(...)) correctly sources factor levels from the model frame. ✓
  • !is.na(homeless) & !is.na(polydrug) & ... filter removes the prior stray "homeless: NA" facet. ✓
  • Facet structure (rows = homeless, cols = polydrug) mirrors @fig-needles. ✓

05d412b — Cross-link and caption

  • Cross-link math-prereqs.qmd#def-orthogonal-vectors is valid: def-orthogonal-vectors is defined in _subfiles/math-prereqs/_sec_linear_algebra.qmd (included by chapters/math-prereqs.qmd), and the path resolves correctly from the chapters/ directory when the subfile is included via {{< include >}}. ✓
  • Caption shortened from "Fitted Poisson rate model overlaid on observed needle-sharing rates" to "Fitted Poisson rate model and observed needle-sharing rates" — within the 80-char limit. ✓
  • Notation inconsistency in the new equation (see above). ✗

Everything else remains good

  • Overdispersion typo fixes, \signt/\Expp substitutions, first-person rewrite, NB content extraction, and LRT section all correct.
  • Zero-inflation PMF and variance derivations mathematically correct throughout.
  • shared_syr / nivdu rate figure with dplyr::filter(nivdu > 0) guard correct.
  • Demographics table structure correct; demog_data scoped before Trans removal.
  • sqrtninj duplicate-label note present.
  • All {{< slidebreak >}} fixes from 6316dcc intact.

@d-morrison

Copy link
Copy Markdown
Owner Author

@claude address reviews

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

👀 Picked up by workflow run #26835004505. R/Quarto/renv setup runs first (~3-5 min); Claude itself responds after that.

@d-morrison

Copy link
Copy Markdown
Owner Author

@claude add a graph of sharedsyr vs nivdu

claude Bot and others added 3 commits June 2, 2026 17:09
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Shows the raw count relationship between shared syringes and injections,
complementing fig-needles (which shows the rate shared_syr/nivdu).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

👀 Picked up by workflow run #26835077308. R/Quarto/renv setup runs first (~3-5 min); Claude itself responds after that.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

No new comments. The request is already fully addressed. Here's what happened and the current state of the PR:


The graph of shared_syr vs nivdu was already added to the PR branch in the session just before this one (commit 5e115325, "Add fig-needles-raw: scatter of shared_syr vs nivdu"). The figure (#fig-needles-raw) is now in chapters/exr-needle-sharing.qmd, placed between fig-needles (the rate plot) and the covariate-counts section:

#| fig-cap: "Shared syringes vs. number of injections (raw counts)"
#| label: fig-needles-raw
needles |>
  ggplot(aes(x = nivdu, y = shared_syr, shape = sex, col = ethn)) +
  geom_point(alpha = .5) +
  facet_grid(cols = vars(polydrug), rows = vars(homeless)) +
  labs(
    x = "Number of injections (in 30 days)",
    y = "Number of shared syringes (in 30 days)"
  ) +
  theme(legend.position = "bottom")

That session also fixed the last outstanding review finding: the exr-zinf-pmf exercise prompt (line 34 of _sec_zero-inflation.qmd) now correctly uses $P(Z=1|X=x)$ without T-conditioning, matching the model definition and solution. All previously-reported review issues are now resolved.

— posted by @claude post-step from workflow run

@d-morrison

Copy link
Copy Markdown
Owner Author

Claude Code CLI (local session) is working on this — paws off until I'm done.

- fig-needles-raw: add dashed x=y line (shared syringes can't exceed
  injections) and wrap the bottom legends onto multiple rows so they
  don't get clipped
- add fig-needles-dag: hypothesized causal structure (demographics +
  social factors -> nivdu -> shared_syr)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Gather diff and read changed files
  • Run multi-angle review (line-by-line, removed-behavior, cross-file)
  • Verify candidates
  • Post findings

View job run

Per the latest @claude review's four findings:

1. Intro covariate list (line ~55): add HIV serostatus alongside age,
   sex, housing status, and drug-use patterns; hivstat is in glm1 now
   and was conspicuously missing from the overview sentence.

2. Stale "simplified DAG" note (line ~187): the previous note said a
   fuller model could add HIV status, depression, hopelessness — but
   the DAG already shows all three (and glm1 includes hivstat).
   Replaced with a note that explains which DAG nodes the fitted model
   does and doesn't condition on, and points to the mental-health
   nodes as an extension exercise.

3. Sex IRR paragraph (line ~426): drop the hard-coded adjustment list
   ("age, housing status, and drug-use pattern") and use the generic
   "the other covariates" phrasing the homelessness and polydrug
   paragraphs already use — now consistent and won't drift if the
   covariate set changes again.

4. Zero-inflation ziformulas (extensions, lines 36 and 61): drop
   hivstat from the zero-inflation submodel — with only ~8/128 HIV+
   participants this risks the same convergence failure the prior
   code explicitly avoided for the homeless*polydrug interaction.
   Keep hivstat in the count formula and add a comment recording the
   rationale.

https://claude.ai/code/session_01PtDgktdMme5SZtze1EGx9U
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 3m 19s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read current state of changed files
  • Cross-check prior review findings against current state
  • Run fresh multi-angle review on latest commit (a95e486)
  • Post findings

Commit a95e486 ("Address #860 review: hivstat consistency + ziformula safety") resolves all four issues that had been open since the hivstat expansion. The PR is now clean.


All four previously-open issues resolved in a95e486

1. Introduction covariate list (chapters/exr-needle-sharing.qmd, line 56) — Fixed. Now reads "…age, sex, housing status (homelessness), drug-use patterns, and HIV serostatus". ✓

2. Stale ::: notes text (chapters/exr-needle-sharing.qmd, lines 187–190) — Fixed. The speaker-notes block now correctly reads: "This DAG depicts the hypothesized causal structure; the fitted Poisson model in @tbl-needles-pois adjusts only for the demographic, social, and serostatus subset, not for the mental-health nodes (dprsn_dx, hplsns, sexabuse) — extending the model is left as an exercise." Accurately describes what is and isn't in the model. ✓

3. Sex IRR paragraph (chapters/exr-needle-sharing.qmd, line 428) — Fixed. Now reads "adjusting for the other covariates", consistent with the homelessness and HIV serostatus paragraphs. ✓

4. hivstat in ziformula (chapters/exr-needle-sharing-extensions.qmd, lines 39 and 67) — Fixed. hivstat removed from both zero-inflation sub-model formulas; each now has an explanatory comment documenting why (only ~8/128 participants HIV+, mirroring the rationale for dropping the homeless*polydrug interaction). The count component (formula = ...) retains hivstat in both fits. ✓


Full history: all 27 findings across all review rounds confirmed resolved

# Issue Status
1 \exp{\ci} renders as single expression, not two endpoints
2 Subscript mismatch \beta_j \in [\ci]
3 \dsim macro instead of raw \dot\sim
4 exr-zinf-moments T-conditioning (line 73)
5 exr-zinf-pmf T-conditioning (line 34)
6 {{< slidebreak >}} in _sec_zero-inflation.qmd
7 {{< slidebreak >}} in exr-needle-sharing.qmd (×2)
8 sqrtninj duplicate label clarified
9 (\vy - \hat{\v{\mu}}) hat-atop-vector notation
10 LTE inner expectations carry (X=x,T=t) explicitly
11 β₀ notation collision between Wald/LRT sections
12 μ vs μ₀ inconsistency across adjacent solutions
13 fig-needles-raw div format
14 offset(log(nivdu)) inside formula
15 Overdispersion conclusion uses specific-value conditioning
16 Vector estimator notation rule documented in CLAUDE.md
17 Two raw --- slide breaks (commit ca62297)
18 tbl-needles-count div format
19 Unreferenced tables
20 shsyr label clarified
21 fig-needles rate vs count narrative inconsistency
22 Uncited Introduction claims
23 Render-breaking tbl-compare-poisson-nb tibble mismatch
24 Introduction omits hivstat from covariate list ✓ (commit a95e486)
25 Stale ::: notes text contradicts expanded DAG ✓ (commit a95e486)
26 Sex IRR paragraph names adjustment set without hivstat ✓ (commit a95e486)
27 hivstat in ziformula — convergence risk ✓ (commit a95e486)

Everything looks good

  • Overdispersion typo fixes ("if if", "value is predicted") correct.
  • \signt and \Expp macro substitutions correct.
  • First-person rewrite in Quasipoisson section reads well; NB content moved out of ::: notes correct.
  • Zero-inflation PMF derivation ($P(Y=0|X,T) = \pi + (1-\pi)e^{-\mu_0}$) and Law of Total Variance derivation ($\text{Var}[Y|X=x,T=t] = (1-\pi)\mu_0(1+\pi\mu_0)$) both mathematically correct throughout.
  • LRT formula: $G^2 = 2[\hat\ell_1 - \hat\ell_0]$ with \dsim, M₁/M₀ naming, and parenthetical distinguishing model-index subscripts from the scalar Wald null β₀. Correct.
  • Rate-ratio CI uses two explicit endpoints; log-scale CI uses ± shorthand — both internally consistent.
  • glm1 formula (shared_syr ~ age + sex + homeless + polydrug + hivstat, no offset) consistent with Vittinghoff §8.3.1 and mediator argument. ✓
  • glm1.nb formula now matches glm1 (both have hivstat); tbl-compare-poisson-nb tibble lengths equal. ✓
  • ziformula = ~ age + sex + homeless + polydrug (no hivstat) in both ZIP and ZINB fits; explanatory comments present. ✓
  • pred_grid includes hivstat = levels(hivstat) for full prediction surface; predict(type="response") returns counts directly. ✓
  • fct_lump_min(min = 5) and relevel(ref = "White") correctly ordered.
  • demog_data (gtsummary::tbl_summary) scoped before Trans removal; demographics table caption notes "all participants". ✓
  • DAG structure expanded to all 9 covariate nodes; caption and ::: notes block accurately describe which nodes the model does and doesn't condition on. ✓
  • \hat{\v{\mu}} (hat atop tilde-vector symbol) correct per documented convention in CLAUDE.md.
  • @tsui2009hcv correctly hedged as "possible (though unconfirmed)"; BibTeX entry complete. ✓
  • All {{< slidebreak >}} shortcodes in new content correct; all remaining --- markers in the file are pre-existing.

@d-morrison

Copy link
Copy Markdown
Owner Author

Claude Code CLI (local session) is working on this — paws off until I'm done.

- Migrate def-offset into the rendered count chapter and link it with
  @def-offset (the old poisson.qmd#def-offset target is an orphan chapter
  that never renders, so the link was dead).
- tbl-zeroinf-nb: reduce the zero-inflation submodel to ziformula = ~ 1.
  With all four covariates the NB overdispersion and the zero-inflation
  component compete to explain the zeros, the logistic submodel separates,
  and the odds ratios run off to +/-Inf. The DAG also gives no covariates
  to a structural-zero class. Add a note explaining both reasons.
- Put shared_syr (count) y-axes on a pseudo_log scale in fig-needles,
  fig-needles-raw, and fig-needles-pois (handles the many zeros). For
  fig-needles-raw, transform both axes so the y = x reference line stays
  straight.
- Un-nest aes() out of ggplot() in those three figures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 3m 55s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read diff and new commit (b00607f)
  • Cross-check prior review findings against current state
  • Run multi-angle review on new content
  • Post findings

All 27 prior findings remain resolved. Commit b00607f ("Fix offset xref; drop ZI covariates; pseudo-log count axes; un-nest aes") introduces one issue.


1. DAG argument in ZINB ::: notes is inconsistent with the ZIP model above it — chapters/exr-needle-sharing-extensions.qmd, lines 79–86

The new ::: notes block gives two reasons for the ZINB using ziformula = ~ 1:

  • The DAG gives it no covariates. @fig-needles-dag posits a single process in which the covariates act on shared_syr…; it does not posit a separate latent class of "structural never-sharers" for those covariates to predict. So a covariate-laden zero-inflation submodel is not identified by the assumed causal structure.
  • The data cannot identify it either. The negative binomial's overdispersion parameter and the zero-inflation component both compete to explain the excess zeros, so…the logistic fit separates.

The first reason (the DAG argument) is presented as a primary justification, but the ZIP model immediately above uses ziformula = ~ age + sex + homeless + polydrug. If the DAG does not support covariates for the structural-zero class, that argument applies equally to ZIP — yet ZIP retains its covariates without comment. A student reading the note will notice the logical tension: if the DAG disqualifies covariates in the ZINB zero-inflation component, why are they present in the ZIP zero-inflation component?

The second reason (separation due to competition with the NB overdispersion parameter) is ZINB-specific and does not apply to the ZIP model, which has no such competing parameter. This is the substantive reason the two models differ.

The fix is to reframe the note so the ZINB-specific separation issue is the primary (and sufficient) reason, and the DAG argument is presented as a secondary note that applies to both models:

::: notes
Unlike the zero-inflated Poisson model above,
here the zero-inflation submodel uses only an intercept (`ziformula = ~ 1`).

The main reason is model-specific:
in the negative binomial,
the overdispersion parameter and the zero-inflation component
**both compete to explain the excess zeros**.
Including all four covariates in the zero-inflation submodel
causes the logistic fit to *separate*:
its coefficients diverge and the exponentiated estimates run off to $\pm\infty$.
Reducing the zero-inflation submodel to a single constant removes the separation.

A second, structural reason applies to **both** zero-inflated models:
@fig-needles-dag posits a single process in which the covariates act on `shared_syr`;
it does not posit a separate latent class of structural never-sharers
for those covariates to predict.
Covariates in the zero-inflation submodel are therefore not supported by the causal graph.
The ZIP model above retains them for illustration purposes.
:::

Fix this →


Everything else in b00607f looks good

  • @def-offset definition (_subfiles/count-regression/_sec_pois-reg_intro.qmd, lines 75–81): definition correctly placed after the ::: notes block that references it; @def-offset resolves within chapters/count-regression.qmd which includes this subfile at line 29. ✓
  • Dead link fixed: poisson.qmd#def-offset was a link to an orphan chapter that never renders; replaced with @def-offset referencing the inline definition. ✓
  • pseudo_log y-axis on fig-needles, fig-needles-raw, and fig-needles-pois: appropriate for count data with many zeros; break values c(0, 1, 3, 10, 30, 60) are sensible. ✓
  • Both axes pseudo_log in fig-needles-raw: the comment "a shared transform keeps the y = x reference line straight (data diagonal -> panel diagonal)" is correct — applying the same monotone transform to both axes preserves the linearity of y = x. ✓
  • ZINB ziformula = ~ 1: correctly addresses the prior review finding about convergence/separation risk in the ZINB zero-inflation submodel. ✓
  • aes() un-nesting: style change only, no effect on rendering. ✓

All 27 prior findings confirmed resolved

# Issue Status
1 \exp{\ci} renders as single expression, not two endpoints
2 Subscript mismatch \beta_j \in [\ci]
3 \dsim macro instead of raw \dot\sim
4–5 T-conditioning in zero-inflation exercises
6–7 {{< slidebreak >}} in both files
8 sqrtninj duplicate label clarified
9 \hat{\v{\mu}} hat-atop-vector notation
10 LTE inner expectations carry (X=x,T=t)
11 β₀ notation collision Wald/LRT
12 μ vs μ₀ inconsistency
13 fig-needles-raw div format
14 offset inside formula
15 Overdispersion conclusion conditioning
16 Vector estimator convention in CLAUDE.md
17–18 Slide breaks and tbl-needles-count div format
19 Unreferenced tables
20 shsyr label clarified
21 fig-needles rate vs count narrative inconsistency
22 Uncited Introduction claims
23 tbl-compare-poisson-nb tibble mismatch
24 Introduction omits hivstat from covariate list
25 Stale ::: notes text contradicts expanded DAG
26 Sex IRR paragraph names adjustment set without hivstat
27 hivstat in ZINB ziformula — convergence risk

Rebuild the needle-sharing DAG to depict structure among the covariates:
- Birth-fixed variables (age, sex, ethn) stay exogenous.
- The six time-varying variables (homeless, polydrug, hivstat, sexabuse,
  dprsn_dx, hplsns) are joined pairwise by bidirected edges (<->), denoting
  associations whose causal direction is unknown.
Lay the time-varying nodes on a hexagon and render with ggdag so the
bidirected web is legible; explain the bidirected edges in the caption and
notes. Add "bidirected" and "leftrightarrow" to WORDLIST.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 7m 18s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read diff and new commits
  • Cross-check prior review findings against current state
  • Run multi-angle review on new content
  • Post findings

All 27 findings from prior rounds remain resolved. Commit e433d8e ("Show covariate causal structure in fig-needles-dag") introduces one potential rendering concern. One issue from the prior round (b00607f) remains open.


Still open from prior round: chapters/exr-needle-sharing-extensions.qmd, lines 79–86

ZINB ::: notes DAG argument contradicts ZIP model above

The ZINB ::: notes block (lines 74–98) presents two reasons for ziformula = ~ 1:

  • The DAG gives it no covariates. @fig-needles-dag posits a single process in which the covariates act on shared_syr…; it does not posit a separate latent class of "structural never-sharers" for those covariates to predict.
  • The data cannot identify it either. The negative binomial's overdispersion parameter and the zero-inflation component both compete to explain the excess zeros…

The DAG argument (bullet 1) applies equally to the ZIP model immediately above — yet ZIP uses ziformula = ~ age + sex + homeless + polydrug. A student reading sequentially hits the DAG justification for ziformula = ~ 1 in ZINB and immediately wonders why ZIP was allowed to use the same covariates in its zero-inflation submodel. The notes never explain this. The separation issue (bullet 2) is ZINB-specific and is the substantive reason the two models differ; it should be the primary reason.

The suggested fix from the prior review at 2026-06-03T19:40:08Z still stands: reframe so the separation reason is primary ("Unlike ZIP, the NB overdispersion parameter and the zero-inflation component compete to explain excess zeros, causing separation when all four covariates are included") and the DAG argument is presented as a secondary note that applies to both models — with a sentence acknowledging that ZIP retains covariates for illustration.

Fix this →


New commit e433d8e — potential concern: bidirected-edge rendering in ggdag 0.2.13

File: chapters/exr-needle-sharing.qmd, lines 158–168

The caption (line 184) explicitly promises "bidirected arrows ($\leftrightarrow$)", implying that the bidirected edges among the time-varying nodes render as visually distinct double-headed arrows. In ggdag 0.2.13, however, dagitty bidirected edges (<->) may be expanded into two antiparallel unidirectional arrows (one row each way in the tidy_dagitty() output) rather than rendered as a single double-headed edge. With 15 bidirected pairs, this means up to 30 unidirectional arrows among 6 tightly-spaced hexagonal nodes — visually indistinguishable from the directed arrows to nivdu and shared_syr. The figure would not communicate the intended structure.

If the local render confirms that directed and bidirected edges are not visually distinguished, consider using geom_dag_edges_arc() (with aes(colour = direction) or similar) to separate the two types, or rendering the bidirected edges in a different colour or linetype. If the rendered figure clearly shows the bidirected structure, this concern is resolved.

Verify this →


New commit e433d8e — everything else looks good

  • DAG string construction via paste(covariates, "-> ...") and combn(timevarying, 2, ...) is syntactically correct for dagitty. ✓
  • Custom hexagonal coordinates via dagitty::coordinates() are valid; all 11 node names are present in both x and y lists. ✓
  • ggraph::circle() is correctly namespace-qualified; ggraph is in renv.lock. ✓
  • geom_dag_point(), geom_dag_text(), and theme_dag() are all available in ggdag 0.2.13 and used correctly. ✓
  • Caption explains the distinction between exogenous (birth-fixed) and time-varying nodes. ✓
  • $\leftrightarrow$ in the caption will render as LaTeX math in HTML and PDF output. ✓
  • "bidirected" and "leftrightarrow" added to inst/WORDLIST as expected. ✓
  • New prose in ::: notes ("The bidirected arrows … adjusted association, not an isolated causal effect") is methodologically correct. ✓
  • The updated ::: notes speaker text no longer contains the stale "simplified DAG" wording; it accurately says the model includes hivstat but omits dprsn_dx, hplsns, sexabuse. ✓

All 27 prior findings confirmed resolved

# Issue Status
1 \exp{\ci} renders as single expression, not two endpoints
2 Subscript mismatch \beta_j \in [\ci]
3 \dsim macro instead of raw \dot\sim
4–5 T-conditioning in zero-inflation exercises
6–7 {{< slidebreak >}} in both files
8 sqrtninj duplicate label clarified
9 \hat{\v{\mu}} hat-atop-vector notation
10 LTE inner expectations carry (X=x,T=t)
11 β₀ notation collision Wald/LRT
12 μ vs μ₀ inconsistency
13 fig-needles-raw div format
14 offset(log(nivdu)) inside formula
15 Overdispersion conclusion conditioning
16 Vector estimator convention in CLAUDE.md
17–18 Slide breaks and tbl-needles-count div format
19 Unreferenced tables
20 shsyr label clarified
21 fig-needles rate vs count narrative inconsistency
22 Uncited Introduction claims
23 tbl-compare-poisson-nb tibble mismatch
24 Introduction omits hivstat from covariate list
25 Stale ::: notes text contradicts expanded DAG
26 Sex IRR paragraph names adjustment set without hivstat
27 hivstat in ZINB ziformula — convergence risk

@d-morrison

Copy link
Copy Markdown
Owner Author

@claude review again — addressed: dropped the zero-inflation covariates in tbl-zeroinf-nb (the NB overdispersion + ZI component were competing to explain the zeros, so the logistic submodel separated and the ORs ran to ±∞; now ziformula = ~ 1), migrated def-offset into the rendered chapter (the old poisson.qmd#def-offset link pointed to an orphan chapter), put the count y-axes on a pseudo-log scale, un-nested aes() from ggplot(), and rebuilt fig-needles-dag to show the covariate causal structure (bidirected edges among the time-varying variables).

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

👀 Picked up by workflow run #26909201792. R/Quarto/renv setup runs first (~3-5 min); Claude itself responds after that.

- The offset reference pointed at poisson.qmd#def-offset, but poisson.qmd is
  include-only (pulled into probability.qmd) and has no standalone page. Point
  the link at probability.qmd#def-offset, where the existing def-offset anchor
  actually renders. (Earlier I had added a local def-offset definition; that
  duplicated the anchor, so this replaces it with the corrected link.)
- Brace-protect the C in "hepatitis {C}" in the tsui2009hcv title so
  sentence-case citation styles don't lowercase it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 6m 56s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read diff and new commits since last review
  • Cross-check prior review findings against current state
  • Run multi-angle review (line-by-line, removed-behavior, cross-file, altitude)
  • Verify candidates
  • Post findings

All 27 prior findings remain resolved. The latest commit (97a3a72 — "Fix offset link target; protect hepatitis {C} in tsui2009hcv") introduces no new issues. One issue from prior rounds remains open.


Latest commit (97a3a72) — verified correct

probability.qmd#def-offset link — Valid. _sec_pois-reg_intro.qmd is included in chapters/count-regression.qmd; the link probability.qmd#def-offset is a cross-chapter link. probability.qmd includes poisson.qmd (line 318), and #def-offset is defined in poisson.qmd (line 326), so the rendered anchor exists on the probability.qmd page. ✓

{C} brace-protection in tsui2009hcv BibTeX — Correct: protects "C" from sentence-casing in citation styles. ✓

Duplicate local #def-offset removed — Correct: the local definition in _sec_pois-reg_intro.qmd was a duplicate of the canonical one in poisson.qmd; removing it eliminates the duplicate-anchor warning. ✓


Still open: chapters/exr-needle-sharing-extensions.qmd

ZINB ::: notes DAG argument contradicts ZIP model above (lines 79–86)

The ZINB notes block states as its first reason for ziformula = ~ 1:

"The DAG gives it no covariates. @fig-needles-dag posits a single process in which the covariates act on shared_syr … it does not posit a separate latent class of 'structural never-sharers' for those covariates to predict. So a covariate-laden zero-inflation submodel is not identified by the assumed causal structure."

But the ZIP model immediately above (line 39) uses ziformula = ~ age + sex + homeless + polydrug — the same four covariates in its zero-inflation submodel — under the exact same DAG. No prose explains why the DAG argument applies to ZINB but not to ZIP.

A student reading sequentially will notice the ZIP zero-inflation submodel uses the same covariates that the ZINB note says are "not identified by the assumed causal structure." The ZINB-specific separation issue (bullet 2: "the NB overdispersion parameter and the zero-inflation component both compete to explain the excess zeros") is the substantive reason the two models differ; the DAG argument is the same for both.

This has been flagged in every review since 2026-06-03 19:40 UTC — six consecutive rounds.

Fix: Reframe the note so the separation reason is primary and ZINB-specific, and the DAG argument is presented as a secondary note applying to both models, with a sentence acknowledging that ZIP retains covariates in its zero-inflation submodel as a modeling choice (not because the DAG supports it):

::: notes
Unlike the zero-inflated Poisson model above,
here the zero-inflation submodel uses only an intercept (`ziformula = ~ 1`).

The main reason is ZINB-specific:
in the negative binomial model, the overdispersion parameter and the
zero-inflation component **both compete to explain excess zeros**,
so including all four covariates in the zero-inflation submodel causes
the logistic fit to *separate* — its coefficients diverge and the
exponentiated estimates run off to $\pm\infty$.
Reducing to an intercept-only submodel removes the separation.

A second, structural reason applies to **both** zero-inflated models:
@fig-needles-dag posits a single process in which the covariates act on
`shared_syr`; it does not posit a separate latent class of structural
never-sharers. The ZIP model above retains covariates in its zero-inflation
submodel for illustration purposes, not because the causal structure supports it.
:::

Fix this →


Minor note: bidirected-edge rendering in ggdag (fig-needles-dag)

The DAG code in e433d8e uses <-> edges in a dagitty string and renders them with geom_dag_edges(). In ggdag ≤ 0.2.x, bidirected edges may be expanded into two antiparallel unidirectional arrows (one row each way in tidy_dagitty()) rather than a single double-headed arrow — making them visually indistinguishable from directed edges. With 15 bidirected pairs among 6 tightly-spaced hexagonal nodes, the caption's promise of $\leftrightarrow$ arrows could be unfulfilled in the rendered figure. The PR preview at https://d-morrison.github.io/rme/pr-preview/pr-860/ would be the definitive check; if directed and bidirected edges are visually indistinguishable, geom_dag_edges_arc() or a colour/linetype split would fix it.


All 27 prior findings confirmed resolved

# Issue Status
1 \exp{\ci} renders as single expression
2 Subscript mismatch \beta_j \in [\ci]
3 \dsim macro
4–5 T-conditioning in zero-inflation exercises
6–7 {{< slidebreak >}} in both files
8 sqrtninj duplicate label
9 \hat{\v{\mu}} hat-atop-vector
10 LTE inner expectations carry (X=x,T=t)
11 β₀ notation collision Wald/LRT
12 μ vs μ₀ inconsistency
13 fig-needles-raw div format
14 offset inside formula
15 Overdispersion conclusion conditioning
16 Vector estimator convention in CLAUDE.md
17–18 Slide breaks and tbl-needles-count div format
19 Unreferenced tables
20 shsyr label clarified
21 fig-needles rate vs count narrative inconsistency
22 Uncited Introduction claims
23 tbl-compare-poisson-nb tibble mismatch
24 Introduction omits hivstat from covariate list
25 Stale ::: notes text contradicts expanded DAG
26 Sex IRR paragraph names adjustment set without hivstat
27 hivstat in ZINB ziformula — convergence risk

Everything else looks good

  • 97a3a72: probability.qmd#def-offset is valid (poisson.qmd included at line 318); duplicate local anchor correctly removed; {C} brace-protection in BibTeX is correct. ✓
  • e433d8e DAG expansion: timevarying (6 elements) matches hex_x/hex_y (both 6 elements); setNames() call is safe; <-> syntax is supported by dagitty. ✓
  • glm1.nb formula (shared_syr ~ age + sex + homeless + polydrug + hivstat) matches glm1; tbl-compare-poisson-nb tibble lengths are equal. ✓
  • ZIP zinf_fit1 and ZINB zinf_fit_nb are correctly named and independently scoped; no cross-contamination. ✓
  • Overdispersion, \signt/\Expp substitutions, first-person rewrite, NB extraction all correct throughout. ✓

@d-morrison

Copy link
Copy Markdown
Owner Author

Heads-up on CI: lint-changed-files is red due to pre-existing style lints in chapters/exr-needle-sharing-extensions.qmd that predate this PR and are not in code I added — = assignments (lines 4, 32) and the glm1.nb object name (flagged by the custom object_name_linter), plus trailing blank lines in the existing chunks. Per the repo convention I didn't rewrite the author's established = style in code I didn't author. Happy to do a separate lint-cleanup pass on that file if you'd like it green.

@d-morrison

Copy link
Copy Markdown
Owner Author

@claude review

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

👀 Picked up by workflow run #26915836817. R/Quarto/renv setup runs first (~3-5 min); Claude itself responds after that.

…structure

Recast the needle-sharing DAG around homelessness as the exposure of interest
(following Vittinghoff §8.3.1) and the cross-sectional study design:

- Mark `homeless` as exposure and `shared_syr` as outcome.
- Cross-sectional => the current covariates can't be ordered causally; each is
  drawn as a proxy for an unobserved *past state* (`U_x`, hollow node) that
  causes both the current measurement and the 30-day outcome. These latent
  past states are the confounders.
- `nivdu` (injection frequency) is the only genuine mediator: a shared syringe
  is a subset of injections.
- Add a conditioned-on study-inclusion node (`selected`); several covariates
  affect being sampled, so conditioning on it induces selection bias.
- All directed edges (no bidirected), which also removes the ggdag
  double-arrow rendering ambiguity from the prior version.

Update the caption and notes to explain the cross-sectional reading, the single
mediator, the latent-past confounders, and selection bias.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 5m 41s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read current state of changed files and latest commits
  • Cross-check prior review findings against current state
  • Run fresh multi-angle review
  • Post findings

All 27 prior findings remain resolved. The latest commit (6188640 — "Redesign fig-needles-dag: cross-sectional, homelessness-focal causal structure") is a significant and welcome improvement, but it introduces two issues in chapters/exr-needle-sharing-extensions.qmd by making the ZINB ::: notes block stale. One issue has been flagged in every review since 2026-06-03 19:40 UTC (eight rounds).


Still open (eighth consecutive round): ZINB ::: notes ZIP inconsistency — chapters/exr-needle-sharing-extensions.qmd, lines 79–86

The ZINB notes block lists as its first reason for using ziformula = ~ 1:

The DAG gives it no covariates. @fig-needles-dag posits a single process in which the covariates act on shared_syr (directly and through the mediator nivdu); it does not posit a separate latent class of "structural never-sharers" for those covariates to predict. So a covariate-laden zero-inflation submodel is not identified by the assumed causal structure.

But the ZIP model immediately above (line 39) uses ziformula = ~ age + sex + homeless + polydrug — four covariates in the zero-inflation submodel — under the exact same DAG. The notes never explain why the DAG argument applies to ZINB but not to ZIP. A student reading sequentially hits the DAG justification for ziformula = ~ 1 in ZINB and immediately wonders why ZIP used the same covariates in its zero-inflation submodel.

The ZINB-specific separation issue (bullet 2: "the NB overdispersion parameter and the zero-inflation component both compete to explain the excess zeros") is the substantive reason the two models differ; the DAG argument is the same for both.

Fix: reframe so the separation reason is primary and ZINB-specific, with the DAG argument presented as a secondary note applying to both models and a sentence explaining that ZIP retains covariates for illustration despite the DAG not supporting them:

::: notes
Unlike the zero-inflated Poisson model above,
here the zero-inflation submodel uses only an intercept (`ziformula = ~ 1`).

The main reason is ZINB-specific:
in the negative binomial, the overdispersion parameter and the
zero-inflation component **both compete to explain excess zeros**.
Including the four covariates in the zero-inflation submodel causes
the logistic fit to *separate* — coefficients diverge and the
exponentiated estimates run off to $\pm\infty$.
Reducing to an intercept-only submodel removes the separation.

A second, structural reason applies to **both** zero-inflated models:
`@fig-needles-dag` posits a single data-generating process;
it does not posit a separate latent class of structural never-sharers.
The ZIP model above retains covariates in its zero-inflation submodel
for illustration purposes, not because the causal structure supports it.
:::

Fix this →


New issue: ZINB ::: notes description of @fig-needles-dag no longer matches the actual DAG — chapters/exr-needle-sharing-extensions.qmd, line 80–81

The ZINB notes (written before commit 6188640) describe the DAG as:

@fig-needles-dag posits a single process in which the covariates act on shared_syr (directly and through the mediator nivdu)

But commit 6188640 completely redesigns fig-needles-dag. In the new DAG:

  • The current measured covariates (homeless, polydrug, hivstat, etc.) are proxies for unobserved latent past states (U_x)
  • It is the U_x nodes that cause shared_syr, not the current covariates directly
  • nivdu is a mediator only for the U_homeless → nivdu → shared_syr path, not for all covariates
  • The figure caption explicitly says "each [covariate] is a proxy for an unobserved past state (U_…) that is the actual cause of both the current measurement and the 30-day behaviour"

The ZINB notes description of the DAG as "covariates act on shared_syr directly and through nivdu" is now factually wrong about the figure it references. Any student who looks at @fig-needles-dag and then reads the ZINB notes will find the description inconsistent with the figure they just saw.

This issue is compounded with the ZIP consistency issue above — both should be addressed in the same fix.

Fix this →


New DAG (6188640) — everything else looks good

The redesigned fig-needles-dag introduces a cross-sectional latent-cause structure:

  • All-directed edges: the bidirected-edge rendering concern from prior rounds is fully resolved. ✓
  • U_x latent past states (paste0("U_", timevarying)) correctly created; all 6 are covered. ✓
  • Coordinates: setNames(y6, past) and setNames(y6, timevarying) correctly align each latent past state at the same y-coordinate as its measured proxy. ✓
  • case_when role assignment: all 6 roles (exposure, outcome, mediator, study inclusion, latent past state, measured covariate) are covered and correctly mapped. The 5 manual scales (scale_colour_manual, scale_shape_manual) enumerate all 6 roles. ✓
  • outer(demographics, past, ...) → 18 edges from demographics to past states; as.vector() flattens the matrix to a character vector as required by paste() inside dagitty(). ✓
  • selected (study inclusion): placed at x=3, y=-4.7. This is well below the time-varying node range (y ∈ [−3.5, 3.5]); ggdag should still render it correctly (sink nodes appear as name rows with to = NA in tidy_dagitty()). Worth verifying in the rendered preview that the selected node, its label, and the arrows to it are legible at fig-height = 6.
  • Caption and ::: notes: correctly describe the cross-sectional reading, latent-cause confounders, single mediator, and selection bias. ✓

All 27 prior findings confirmed resolved

# Issue Status
1 \exp{\ci} two explicit endpoints
2 Subscript mismatch \beta_j \in [\ci]
3 \dsim macro
4–5 T-conditioning in zero-inflation exercises
6–7 {{< slidebreak >}} in both files
8 sqrtninj duplicate label
9 \hat{\v{\mu}} hat-atop-vector notation
10 LTE inner expectations carry (X=x,T=t)
11 β₀ notation collision Wald/LRT
12 μ vs μ₀ inconsistency
13 fig-needles-raw div format
14 offset(log(nivdu)) inside formula
15 Overdispersion conclusion conditioning
16 Vector estimator convention in CLAUDE.md
17–18 Slide breaks and tbl-needles-count div format
19 Unreferenced tables
20 shsyr label clarified
21 fig-needles rate vs count narrative inconsistency
22 Uncited Introduction claims
23 tbl-compare-poisson-nb tibble mismatch
24 Introduction omits hivstat from covariate list
25 Stale ::: notes text contradicts expanded DAG
26 Sex IRR paragraph omits hivstat from adjustment list
27 hivstat in ZINB ziformula convergence risk

Following Vittinghoff §8.3.1 (which models shared_syr on homelessness) and the
cross-sectional DAG, recenter the analysis on homelessness as the exposure of
interest:

- Fit shared_syr ~ homeless + age + sex + ethn + polydrug + hivstat +
  sexabuse + hplsns: homelessness as exposure, the other measured covariates
  as proxies for the latent past confounders (all except the mediator nivdu).
  Add sexabuse to the data prep. Exclude dprsn_dx, whose 1/5 coding is
  undocumented in the source data (the book uses only homeless).
- Headline the homelessness IRR in the interpretation (adjusted ~6, vs
  Vittinghoff's unadjusted ~3.3); add ethn/sexabuse/hplsns coefficients.
- Recast the fitted-model figure as the homeless-vs-not contrast over age
  (other covariates at reference), on the pseudo-log scale.
- NB/ZIP/ZINB use the same count model; both zero-inflation submodels use
  ziformula = ~ homeless, matching Vittinghoff's inflate(i.homeless). This
  also resolves the review finding that the old ZINB note (ziformula = ~ 1)
  contradicted the ZIP submodel: both are now consistent and book-faithful.
- Clean up pre-existing = / glm1.nb lint in the rewritten extensions chunks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
@claude

claude Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 5m 53s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read diff and changed files
  • Cross-check prior review findings against current state
  • Run multi-angle review (line-by-line, removed-behavior, cross-file)
  • Verify candidates
  • Post findings

Commit 7d2fcde ("Reframe needle-sharing example around homelessness as the exposure") is the only new commit since the last review and is a substantive improvement. The critical ZINB/ZIP ziformula inconsistency that was open for eight consecutive rounds is now fully resolved. Two new issues introduced by this commit are below.


1. Introduction sentence omits ethn, sexabuse, and hplsnschapters/exr-needle-sharing.qmd, lines 59–62

The model is now shared_syr ~ homeless + age + sex + ethn + polydrug + hivstat + sexabuse + hplsns (eight predictors plus intercept). The introduction sentence at lines 59–62 reads:

"We examine how needle-sharing frequency relates to participants' age, sex, housing status (homelessness), drug-use patterns, and HIV serostatus;"

This omits ethn (ethnicity), sexabuse (sexual-abuse history), and hplsns (hopelessness score). The model prose at lines 407–413 correctly lists all covariates in a categorised form, but a student reading only the Introduction will not know these three predictors are in the model.

The same issue arose when hivstat was added in commit a500a71 (fixed in a95e486). The same pattern has recurred.

Fix: Extend the covariate list, e.g.:

"…age, sex, ethnicity, housing status (homelessness), drug-use patterns, HIV serostatus, sexual-abuse history, and hopelessness score;"

Fix this →


2. sexabuse and hplsns interpretation paragraphs are grammatically incomplete — chapters/exr-needle-sharing.qmd, lines 546–554

The sexual-abuse and hopelessness paragraphs are noun phrases without predicates:

Sexual-abuse history. $\exp{…} = r …$: the expected count for participants reporting a history of sexual abuse relative to those not reporting one, adjusting for the other covariates.

Hopelessness. $\exp{…} = r …$: the multiplicative change in the expected count per one-point increase in the hopelessness score.

Compare with the sex and homelessness paragraphs, which complete the sentence:

"women's expected count of shared syringes is about r … times men's"
"homeless participants share syringes about r … times as often as the non-homeless"

The sexabuse and hplsns paragraphs describe what the IRR represents but never say what it IS relative to the comparison group. The inline R value fills in the number, but without a predicate the sentence doesn't convey whether the IRR is greater or less than 1, or what it measures.

Fix: Complete both sentences with a predicate, e.g.:

Sexual-abuse history. $\exp{…} = r …$: participants reporting sexual abuse are expected to share syringes about r … times as often as those not reporting it, adjusting for the other covariates.

Hopelessness. $\exp{…} = r …$: each one-point increase in the hopelessness score multiplies the expected count by r …, adjusting for the other covariates.

Fix this →


Prior finding resolved: ZINB/ZIP ziformula consistency — chapters/exr-needle-sharing-extensions.qmd

Both zero-inflated models now use ziformula = ~ homeless, matching Vittinghoff's inflate(i.homeless). The ZINB notes block (lines 71–84) correctly explains:

  • Both models use the same parsimonious ziformula = ~ homeless.
  • The ZINB-specific reason for keeping it parsimonious (the NB overdispersion parameter and the zero-inflation component compete, causing separation with more covariates) is clearly stated.
  • No DAG argument is presented as the primary justification, resolving the logical contradiction with the ZIP model.

This was flagged in every review round from 2026-06-03 19:40 UTC (nine rounds). ✓


All 27 prior findings confirmed resolved

# Issue Status
1 \exp{\ci} renders as single expression, not two endpoints
2 Subscript mismatch \beta_j \in [\ci]
3 \dsim macro instead of raw \dot\sim
4–5 T-conditioning in zero-inflation exercises
6–7 {{< slidebreak >}} in both files
8 sqrtninj duplicate label clarified
9 \hat{\v{\mu}} hat-atop-vector notation
10 LTE inner expectations carry (X=x,T=t)
11 β₀ notation collision Wald/LRT
12 μ vs μ₀ inconsistency
13 fig-needles-raw div format
14 offset(log(nivdu)) inside formula
15 Overdispersion conclusion conditioning
16 Vector estimator convention in CLAUDE.md
17–18 Slide breaks and tbl-needles-count div format
19 Unreferenced tables
20 shsyr label clarified
21 fig-needles rate vs count narrative inconsistency
22 Uncited Introduction claims
23 tbl-compare-poisson-nb tibble mismatch
24 Introduction omits hivstat from covariate list
25 Stale ::: notes text contradicts expanded DAG
26 Sex IRR paragraph omits hivstat from adjustment list
27 hivstat in ZINB ziformula — convergence risk

Everything else in 7d2fcde looks good

  • sexabuse correctly coded case_match(1 ~ "yes", 0 ~ "no") |> factor() |> relevel(ref = "no"). ✓
  • ref_lvl() helper correctly picks the first (reference) level from glm1$model. ✓
  • pred_grid correctly holds sex, ethn, polydrug, hivstat, sexabuse at reference, hplsns at its median, and expands over both levels of homeless. ✓
  • predict(glm1, newdata = pred_grid, type = "response") with no offset correctly returns expected counts directly. ✓
  • glm1_nb formula (extensions, line 4) matches glm1 exactly; tbl-compare-poisson-nb lengths are equal. ✓
  • Both zinf_pois and zinf_nb use ziformula = ~ homeless, consistent with each other and with Vittinghoff. ✓
  • DAG ::: notes block (lines 219–246) correctly explains homelessness as focal exposure, nivdu as mediator, and cross-sectional reading of remaining covariates. ✓
  • Model prose at lines 407–413 correctly lists all eight covariates and explains the exclusion of dprsn_dx. ✓
  • Homelessness IRR interpretation correctly notes the unadjusted vs adjusted comparison to Vittinghoff's ≈ 3.3. ✓
  • renv.lock changes (bit64 4.8.0→4.8.2, clipr 0.8.0→0.8.1, cpp11 0.5.4→0.5.5) are minor version bumps; no incompatibilities expected. ✓
  • All {{< slidebreak >}} shortcodes in new content correct; remaining --- markers in both files are pre-existing. ✓

@d-morrison

Copy link
Copy Markdown
Owner Author

@claude review again — substantial reframe pushed:

  • Homelessness is now the focal exposure (per Vittinghoff §8.3.1). Model: shared_syr ~ homeless + age + sex + ethn + polydrug + hivstat + sexabuse + hplsns — homelessness as exposure, the other measured covariates as proxies for the latent past confounders in the DAG (all except the mediator nivdu). Excluded dprsn_dx (its 1/5 coding is undocumented in the source data; the book uses only homeless). Adjusted homelessness IRR ≈ 6 (vs Vittinghoff's unadjusted ≈ 3.3).
  • DAG redesigned for the cross-sectional design: homeless exposure, shared_syr outcome, latent past-state nodes (U_*) as the confounders, nivdu the only mediator, and a conditioned-on selected (study-inclusion) node inducing selection bias. All directed edges (no bidirected → resolves the prior ggdag double-arrow note).
  • Prior review finding resolved: both ZIP and ZINB now use ziformula = ~ homeless (matching inflate(i.homeless)), so the ZINB note no longer contradicts the ZIP submodel; the zero-inflation ORs are finite.
  • Fitted-model figure recast as the homeless-vs-not contrast over age (pseudo-log).
  • Cleaned up the pre-existing =/glm1.nb lint in the rewritten extensions chunks.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

👀 Picked up by workflow run #26972168642. R/Quarto/renv setup runs first (~3-5 min); Claude itself responds after that.

@d-morrison

Copy link
Copy Markdown
Owner Author

@claude review

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

👀 Picked up by workflow run #27039677878. R/Quarto/renv setup runs first (~3-5 min); Claude itself responds after that.

@claude

claude Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 3m 51s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read diff and changed files
  • Cross-check prior review findings against current state
  • Run multi-angle review (line-by-line, removed-behavior, cross-file)
  • Verify candidates
  • Post findings

All 27 findings from prior review rounds remain resolved. Two findings from the last round (2026-06-04T18:42:09Z) remain open — both introduced by commit 7d2fcde ("Reframe needle-sharing example around homelessness as the exposure") and neither addressed since. The merge commit 8d2d87a (bringing in main) touched only workflow files, _sec_linear_algebra.qmd, and unrelated chapters — no new issues introduced.


1. Introduction sentence still omits ethn, sexabuse, and hplsnschapters/exr-needle-sharing.qmd, lines 59–60

The current model (glm1, line 417–418) includes eight predictors:

shared_syr ~ homeless + age + sex + ethn + polydrug + hivstat + sexabuse + hplsns

The Introduction sentence reads:

"We examine how needle-sharing frequency relates to participants' age, sex, housing status (homelessness), drug-use patterns, and HIV serostatus;"

ethn (ethnicity), sexabuse (sexual-abuse history), and hplsns (hopelessness score) are all in the fitted model but absent from this overview sentence. A student reading only the Introduction will expect five predictors and find eight in the model specification with no warning.

This same pattern recurred when hivstat was added in a500a71 and caught in a95e486; it has recurred again with the latest model expansion.

Fix: Extend line 60 to name all model predictors, e.g.:

"age, sex, ethnicity, housing status (homelessness), drug-use patterns, HIV serostatus, sexual-abuse history, and hopelessness score;"

Fix this →


2. sexabuse and hplsns interpretation paragraphs lack predicates — chapters/exr-needle-sharing.qmd, lines 546–554

Both paragraphs are noun phrases without a main clause:

Sexual-abuse history (lines 547–549):

$\exp{…} = `r …`$:
the expected count for participants reporting a history of sexual abuse
relative to those not reporting one, adjusting for the other covariates.

Hopelessness (lines 552–554):

$\exp{…} = `r …`$ per point:
the multiplicative change in the expected count per one-point increase
in the hopelessness score.

Compare with the sex and homelessness paragraphs which complete the thought:

  • "women's expected count of shared syringes is about r … times men's" (line 515)
  • "homeless participants share syringes about r … times as often as the non-homeless" (line 521)

A student reading the sexabuse and hplsns paragraphs cannot tell from the prose alone whether the IRR is greater or less than 1. The inline R expression fills in the number but without a predicate the sentence doesn't convey direction or interpretation.

Fix: Complete both sentences with predicates, e.g.:

Sexual-abuse history. $\exp{…} = r …$: participants with a history of sexual abuse share syringes about r … times as often as those without, adjusting for the other covariates.

Hopelessness. $\exp{…} = r …$ per point: each one-point increase in the hopelessness score multiplies the expected count by r …, adjusting for the other covariates.

Fix this →


Everything else looks good

  • All 27 previously-reported findings remain resolved.
  • Merge commit 8d2d87a (bring main into branch) introduces no new issues in the PR's changed files.
  • ZINB notes block correctly makes the separation issue (NB overdispersion + ZI component competition) primary; the DAG argument no longer contradicts the ZIP model above it. ✓
  • ziformula = ~ homeless in both ZIP and ZINB, matching @vittinghoff2e's inflate(i.homeless). ✓
  • glm1_nb formula matches glm1 exactly; tbl-compare-poisson-nb tibble lengths equal. ✓
  • All {{< slidebreak >}} shortcodes in new content correct; no new raw --- markers in either file. ✓
  • Overdispersion typo fixes ("if if", "value is predicted") correct.
  • \signt and \Expp macro substitutions correct.
  • Zero-inflation PMF and variance derivations mathematically correct throughout.
  • LRT formula: G² = 2[ℓ̂₁ − ℓ̂₀] with \dsim, M₁/M₀ naming, parenthetical disambiguation. ✓
  • Vector estimator notation rule documented in CLAUDE.md; \hat{\v{\mu}} used correctly in _exr-prac-glm-score.qmd. ✓
  • references.bib entry for tsui2009hcv correctly formed with {C} and {San Francisco} brace-protection. ✓

@claude

claude Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 4m 33s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read diff and changed files
  • Cross-check prior review findings against current state
  • Check latest commits for new issues
  • Post findings

The latest commit (03effe5) is a merge of origin/main into the branch. It touches only .github/workflows/claude.yml, a new .github/skills/ratio-factor-migrate/SKILL.md, minor CLAUDE.md instruction updates, _quarto-website.yml, and the latex-macros submodule pointer — none of which affect the book content. No new issues were introduced.

Two issues flagged in the prior round (2026-06-05T21:43:28Z) remain open and unaddressed.


1. Introduction sentence omits three model predictors — chapters/exr-needle-sharing.qmd, lines 59–60

The fitted model (glm1, line 417–418) has eight predictors:

shared_syr ~ homeless + age + sex + ethn + polydrug + hivstat + sexabuse + hplsns

The Introduction sentence reads:

"We examine how needle-sharing frequency relates to participants' age, sex, housing status (homelessness), drug-use patterns, and HIV serostatus;"

ethn (ethnicity), sexabuse (sexual-abuse history), and hplsns (hopelessness score) are all in glm1 but absent from this sentence. The model prose at lines 407–413 correctly lists all eight covariates; the Introduction sentence does not. A student reading only the Introduction will arrive at the model specification expecting five predictors and find eight.

This same pattern occurred when hivstat was added in commit a500a71 and was caught and fixed in a95e486; it has recurred after the latest model expansion.

Fix: Change line 60 to name all predictors, e.g.:

"…age, sex, ethnicity, housing status (homelessness), drug-use patterns, HIV serostatus, sexual-abuse history, and hopelessness score;"

Fix this →


2. sexabuse and hplsns interpretation paragraphs are noun phrases without predicates — chapters/exr-needle-sharing.qmd, lines 546–554

Both paragraphs describe what the IRR represents but never say what it is:

Sexual-abuse history (lines 547–549):

$\exp{…} = `r …`$:
the expected count for participants reporting a history of sexual abuse
relative to those not reporting one, adjusting for the other covariates.

Hopelessness (lines 552–554):

$\exp{…} = `r …`$ per point:
the multiplicative change in the expected count per one-point increase
in the hopelessness score.

Compare with the sex and homelessness paragraphs, which complete the thought with a predicate:

  • "women's expected count of shared syringes is about r … times men's" (line 514)
  • "homeless participants share syringes about r … times as often as the non-homeless" (line 520)

Without a predicate, a student cannot tell from the prose whether the IRR is greater or less than 1 — the inline R expression fills in the number, but the sentence gives no direction or interpretation. Note that the preceding ethn paragraph (lines 541–544) also omits the number entirely since the coefficients are "not of separate interest", but that is intentional; sexabuse and hplsns are being interpreted here.

Fix: Add predicates to both sentences, e.g.:

Sexual-abuse history. $\exp{…} = r …$: participants with a history of sexual abuse share syringes about r … times as often as those without, adjusting for the other covariates.

Hopelessness. $\exp{…} = r …$ per point: each one-point increase in the hopelessness score multiplies the expected count by r …, adjusting for the other covariates.

Fix this →


All 27 prior findings confirmed resolved

# Issue Status
1 \exp{\ci} renders as single expression, not two endpoints
2 Subscript mismatch \beta_j \in [\ci]
3 \dsim macro instead of raw \dot\sim
4–5 T-conditioning in zero-inflation exercises
6–7 {{< slidebreak >}} in both files
8 sqrtninj duplicate label clarified
9 \hat{\v{\mu}} hat-atop-vector notation
10 LTE inner expectations carry (X=x,T=t)
11 β₀ notation collision Wald/LRT
12 μ vs μ₀ inconsistency
13 fig-needles-raw div format
14 offset(log(nivdu)) inside formula
15 Overdispersion conclusion conditioning
16 Vector estimator convention in CLAUDE.md
17–18 Slide breaks and tbl-needles-count div format
19 Unreferenced tables
20 shsyr label clarified
21 fig-needles rate vs count narrative inconsistency
22 Uncited Introduction claims
23 tbl-compare-poisson-nb tibble mismatch
24 Introduction omits hivstat from covariate list
25 Stale ::: notes text contradicts expanded DAG
26 Sex IRR paragraph omits hivstat from adjustment list
27 hivstat in ZINB ziformula — convergence risk

Everything else looks good

  • Overdispersion typo fixes ("if if", "value is predicted") correct.
  • \signt and \Expp macro substitutions correct.
  • First-person rewrite in Quasipoisson section reads well; NB content moved out of ::: notes correct.
  • Zero-inflation PMF and variance derivations (Law of Total Expectation/Variance, with explicit (X=x,T=t) conditioning restored in the final lines) mathematically correct throughout.
  • LRT formula: $G^2 = 2[\hat\ell_1 - \hat\ell_0]$ with \dsim, M₁/M₀ naming, parenthetical disambiguating model-index from scalar Wald null β₀. Correct.
  • Rate-ratio CI uses two explicit endpoints; log-scale CI uses ± shorthand — both internally consistent.
  • glm1 (shared_syr ~ homeless + age + sex + ethn + polydrug + hivstat + sexabuse + hplsns, no offset) consistent with Vittinghoff §8.3.1 and the mediator argument. ✓
  • glm1_nb formula matches glm1; tbl-compare-poisson-nb tibble lengths equal. ✓
  • Both ZIP and ZINB use ziformula = ~ homeless, matching Vittinghoff's inflate(i.homeless); ZINB ::: notes correctly makes the separation issue (NB overdispersion competing with ZI component) the primary reason for the parsimonious submodel. ✓
  • DAG (fig-needles-dag) redesigned to cross-sectional latent-cause structure; all-directed edges resolve the prior bidirected-arrow rendering concern. ✓
  • @tsui2009hcv correctly hedged as "possible (though unconfirmed)"; BibTeX entry complete. ✓
  • \hat{\v{\mu}} (hat atop tilde-vector symbol) correct per documented convention in CLAUDE.md. ✓
  • All {{< slidebreak >}} shortcodes in new content correct; all remaining --- markers in both files are pre-existing.
  • Merge commit 03effe5 (main → branch) introduces no new issues in the book content.

@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 4m 34s —— View job


Code Review: PR #860 — Revise count regression chapter

Tasks

  • Gather context and understand request
  • Read diff and changed files
  • Cross-check prior open findings against current state
  • Run fresh multi-angle review
  • Post inline findings and summary

All 27 prior findings remain resolved. The most recent merge of main into the branch (141e3ab) touched only workflow/CI files and CLAUDE.md — no book-content changes in the PR's changed files. Two findings from the last round (2026-06-04T18:42:09Z) remain open and are still unaddressed.


1. Introduction sentence still omits three model predictors — chapters/exr-needle-sharing.qmd, lines 59–60

The fitted model (glm1, line 417–418) has eight predictors:

shared_syr ~ homeless + age + sex + ethn + polydrug + hivstat + sexabuse + hplsns

The Introduction sentence currently reads:

"We examine how needle-sharing frequency relates to participants' age, sex, housing status (homelessness), drug-use patterns, and HIV serostatus;"

ethn (ethnicity), sexabuse (sexual-abuse history), and hplsns (hopelessness score) are in glm1 but absent here. The model prose at lines 407–410 lists all covariates correctly; the Introduction does not match. A student reading only the Introduction will expect five predictors and find eight in the model specification.

This same pattern occurred when hivstat was added in commit a500a71 and was caught and fixed in a95e486; it has recurred after the subsequent model expansion.

Fix: Change line 60 to name all predictors:

"…age, sex, ethnicity, housing status (homelessness), drug-use patterns, HIV serostatus, sexual-abuse history, and hopelessness score;"

Fix this →


2. sexabuse and hplsns interpretation paragraphs lack predicates — chapters/exr-needle-sharing.qmd, lines 546–554

Both paragraphs are noun phrases with no main verb:

Sexual-abuse history (lines 547–549):

$\exp{…} = `r …`$:
the expected count for participants reporting a history of sexual abuse
relative to those not reporting one, adjusting for the other covariates.

Hopelessness (lines 552–554):

$\exp{…} = `r …`$ per point:
the multiplicative change in the expected count per one-point increase
in the hopelessness score.

Compare with the sex and homelessness paragraphs, which complete the thought:

  • "women's expected count of shared syringes is about r … times men's" (line 514–515)
  • "homeless participants share syringes about r … times as often as the non-homeless" (line 520–521)

Without a predicate, a student cannot tell from the prose alone whether the IRR is greater or less than 1.

Fix: Add predicates to both, e.g.:

Sexual-abuse history. $\exp{…} = r …$: participants with a history of sexual abuse share syringes about r … times as often as those without, adjusting for the other covariates.

Hopelessness. $\exp{…} = r …$ per point: each one-point increase in the hopelessness score multiplies the expected count by r …, adjusting for the other covariates.

Fix this →


All 27 prior findings confirmed resolved

# Issue Status
1 \exp{\ci} renders as single expression, not two endpoints
2 Subscript mismatch \beta_j \in [\ci]
3 \dsim macro instead of raw \dot\sim
4–5 T-conditioning in zero-inflation exercises
6–7 {{< slidebreak >}} in both files
8 sqrtninj duplicate label clarified
9 \hat{\v{\mu}} hat-atop-vector notation
10 LTE inner expectations carry (X=x,T=t)
11 β₀ notation collision Wald/LRT
12 μ vs μ₀ inconsistency
13 fig-needles-raw div format
14 offset(log(nivdu)) inside formula
15 Overdispersion conclusion conditioning
16 Vector estimator convention in CLAUDE.md
17–18 Slide breaks and tbl-needles-count div format
19 Unreferenced tables
20 shsyr label clarified
21 fig-needles rate vs count narrative inconsistency
22 Uncited Introduction claims
23 tbl-compare-poisson-nb tibble mismatch
24 Introduction omits hivstat
25 Stale ::: notes text contradicts DAG
26 Sex IRR paragraph omits hivstat
27 ZINB ziformula convergence risk

Latest merges: no new issues

  • Merge 141e3ab (main → branch): changed only .github/workflows/, CLAUDE.md, skill files, and renv.lock. No book-content changes in the PR's changed files. ✓
  • ZINB ::: notes block — now correctly uses both ZIP and ZINB with ziformula = ~ homeless and explains the ZINB-specific separation issue as the primary reason for parsimony. Prior ZIP/DAG inconsistency fully resolved. ✓
  • \hat{\v{\mu}} (hat atop tilde-vector) is correct per the newly documented CLAUDE.md convention. ✓
  • probability.qmd#def-offset cross-link (replacing orphan poisson.qmd#def-offset) is valid. ✓
  • DAG redesigned to cross-sectional latent-cause structure with all-directed edges; bidirected rendering concern is moot. ✓

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1
🚀 Preview available at: https://d-morrison.github.io/rme/pr-preview/pr-860/

Built to branch gh-pages at 2026-06-09 08:12 UTC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants