Skip to content

Add Model 2a: same-parameter cross-biomarker covariance as a JAGS extension of model.jags (strictly nests Chapter 1) + Chapter-1 comparison#245

Open
Kwan-Jenny wants to merge 7 commits into
mainfrom
chapter2-extend
Open

Add Model 2a: same-parameter cross-biomarker covariance as a JAGS extension of model.jags (strictly nests Chapter 1) + Chapter-1 comparison#245
Kwan-Jenny wants to merge 7 commits into
mainfrom
chapter2-extend

Conversation

@Kwan-Jenny

Copy link
Copy Markdown
Collaborator

Summary

Adds Model 2a, an additive extension of model.jags that introduces same-parameter cross-biomarker covariance into the subject-level antibody kinetic parameters while leaving every Chapter 1 within-biomarker block unchanged. Chapter 1 is recovered exactly when the new covariances are zero, so Model 2a strictly nests the current model. It also adds compare_mod_2a(), a head-to-head of Chapter 1 vs Model 2a on the same data.

Nothing in the existing Chapter 1 pipeline is modified — model.jags, prep_data(), prep_priors(), run_mod() are untouched; all code is new.

What Model 2a is

Chapter 1 covariance is biomarker-separated (cross-biomarker = 0):

Sigma = [ Sigma_G   0     ]
        [   0     Sigma_A  ]

Model 2a keeps Sigma_G, Sigma_A and adds a diagonal cross-biomarker
block C = diag(c_1..c_P) (same-parameter only):

Sigma = [ Sigma_G   C      ]      C = diag(c_p)
        [   C^T    Sigma_A  ]

c_p is the IgG~IgA covariance for the same kinetic parameter p; cross-parameter cross-biomarker terms stay 0; C = 0 reproduces Chapter 1 (35 covariance parameters vs 30).

Implementation (factor parameterization)

A Wishart precision cannot pin a sparse covariance's zero-pattern, so the exact pattern is generated with a shared latent factor per kinetic parameter:

par[i,k,p] = mu.par[k,p] + w[i,k,p] + lambda[k,p] * f[i,p]
w[i,k,1:P] ~ dmnorm(0, prec.par[k,,])   # Chapter 1 Wishart block, unchanged
f[i,p]     ~ dnorm(0, 1)                 # shared factor, one per parameter

For two biomarkers this gives c_p = lambda[1,p] * lambda[2,p] (derived from the loadings — this is a model quantity, independent of the sampler; it is the same c_p whether implemented in JAGS or Stan), cross-parameter cross-biomarker
covariance 0, and — with all lambda = 0 — exactly model.jags. The first biomarker's loadings are constrained > 0 for identifiability; the sign of c_p stays free via lambda[2,p]. The model is general in n_antigen_isos: with 2 biomarkers it is exactly Model 2a; with more, the single per-parameter factor couples all of them (rank-1, same-parameter).

inst/extdata/model_2a.jags is identical to model.jags except this par construction and the loading priors.

What's added

  • inst/extdata/model_2a.jags — the extended model.
  • Priors: add_factor_priors(), prep_priors_2a() (reuse Chapter 1 priors).
  • Fitting: jags_data_2a(), make_inits_2a(), run_mod_2a() — a lean wrapper that reuses prep_data() unchanged and returns the MCMC plus a tidy cross-biomarker covariance/correlation summary (no sr_model post-processing).
  • Post-processing (small pure helpers): cross_cov_from_loadings(), marginal_var_2a(), cross_cor_from_draw_2a(), summarize_cross_2a(), shared-internal jags_node_utils_2a.R.
  • Simulation + validation: build_sigma_2a(), sim_params_2a(), sim_case_data_2a(), validate_recovery_2a(), validate_nesting_2a().
  • Comparison: fit_chapter1_lean() (same posterior as run_mod()), summarize_curve_params_2a(), compare_mod_2a().

No new dependencies (all in Imports already); no new data (uses nepal_sees).

Chapter 1 vs Model 2a — what changes

compare_mod_2a() fits both models on the same data and reports:

  • What stays the same — the shared population means mu.par and the within-biomarker variances. Because Model 2a strictly nests Chapter 1, these should agree within MCMC error; a large difference would signal a problem, not
    an improvement.
  • What Model 2a adds — the cross-biomarker covariance c_p / correlation rho_p per kinetic parameter, which Chapter 1 cannot represent (it is structurally 0 there).
  • Where improvement would show — the means are unchanged by design; the value of Model 2a lives in the covariance and in covariance-dependent quantities (credible intervals, joint posterior prediction, DIC/WAIC). A rigorous "is it better" evaluation (WAIC/LOO; downstream time-since-infection / seroincidence MAE/RMSE/coverage) is a planned follow-up, not this PR. compare_mod_2a(dic = TRUE) offers a best-effort DIC as a first signal.

Comparison results on nepal_sees (full-length MCMC, run locally):

(to be filled in from the local comparison run: convergence (max R-hat), the shared-parameter table with max |diff|, the cross-biomarker c_p/rho_p table and which CIs exclude 0, and the mu.par CI-width summary)

Verification

  • CI runs the pure-function unit tests + a tiny-MCMC smoke test that run_mod_2a() / compare_mod_2a() execute and return the expected structure (matching the existing test-run_mod.R style; CI installs JAGS).
  • Recovery of a known cross-biomarker correlation from simulated data, the no-false-positive check on independent data, and the Chapter 1 vs Model 2a comparison on nepal_sees, are run locally at full MCMC length (validate_recovery_2a(), validate_nesting_2a(), compare_mod_2a()); results pasted above.

How to try it

data(nepal_sees)              # already case_data; HlyE_IgG, HlyE_IgA
fit <- run_mod_2a(nepal_sees)
fit$cross                 # IgG ~ IgA covariance & correlation per parameter
cmp <- compare_mod_2a(nepal_sees)
cmp$shared; cmp$added     # shared params (Ch1 vs 2a) + the addition

Scope

This PR is the "add the model + show it nests Chapter 1 + compare to Chapter 1" increment. Quantifying improvement (WAIC/LOO, joint posterior-predictive comparison, seroincidence coverage/width) is a follow-up, since WAIC needs a
model-file change and the seroincidence step needs the serocalculator integration.

…AGS extension of model.jags

Add Model 2a, an additive extension of model.jags that introduces
same-parameter cross-biomarker covariance via a shared per-parameter latent
factor, leaving every Chapter 1 within-biomarker block unchanged. Chapter 1 is
recovered exactly when the loadings are zero (strict nesting).

- model: inst/extdata/model_2a.jags
- fit: run_mod_2a() with jags_data_2a, make_inits_2a, prep_priors_2a,
  add_factor_priors; reuses prep_data() unchanged
- cross-biomarker summaries: summarize_cross_2a, cross_cov_from_loadings,
  marginal_var_2a, cross_cor_from_draw_2a (+ internal jags_node_utils)
- comparison vs Chapter 1: compare_mod_2a, fit_chapter1_lean,
  summarize_curve_params_2a
- simulation/validation: sim_params_2a, sim_case_data_2a, build_sigma_2a,
  validate_recovery_2a, validate_nesting_2a
- tests: 7 pure-function unit tests + 1 JAGS-gated smoke test
- example: run_mod_2a-examples.R (one fit on nepal_sees)

No new dependencies; no Chapter 1 code modified.
@claude

This comment has been minimized.

@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.25641% with 46 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
R/validate_recovery_2a.R 0.00% 15 Missing ⚠️
R/validate_nesting_2a.R 0.00% 10 Missing ⚠️
R/build_sigma_2a.R 71.42% 6 Missing ⚠️
R/summarize_cross_2a.R 83.78% 6 Missing ⚠️
R/jags_data_2a.R 69.23% 4 Missing ⚠️
R/cross_cov_from_loadings.R 40.00% 3 Missing ⚠️
R/compare_mod_2a.R 97.22% 1 Missing ⚠️
R/prep_priors_2a.R 75.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
R/add_factor_priors.R 100.00% <100.00%> (ø)
R/cross_cor_from_draw_2a.R 100.00% <100.00%> (ø)
R/fit_chapter1_lean.R 100.00% <100.00%> (ø)
R/jags_node_utils_2a.R 100.00% <100.00%> (ø)
R/make_inits_2a.R 100.00% <100.00%> (ø)
R/marginal_var_2a.R 100.00% <100.00%> (ø)
R/run_mod_2a.R 100.00% <100.00%> (ø)
R/sim_case_data_2a.R 100.00% <100.00%> (ø)
R/sim_params_2a.R 100.00% <100.00%> (ø)
R/summarize_curve_params_2a.R 100.00% <100.00%> (ø)
... and 8 more

Merge remote-tracking branch 'origin/main' into chapter2-extend

# Conflicts:
#	DESCRIPTION
#	inst/WORDLIST
@claude

This comment has been minimized.

@claude

This comment has been minimized.

@claude

This comment has been minimized.

@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

📖 https://ucd-serg.github.io/serodynamics/preview/pr245
Preview documentation for this PR (at commit b3ba634)

Tooling/style only; model and functions unchanged. stop()->cli_abort,
line/indent/semicolon fixes, object_name_linter exemption for *_2a files,
sim_params tolerance 0.05, suppressWarnings on JAGS smoke tests,
@claude

This comment has been minimized.

@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Claude finished @Kwan-Jenny's task in 2m 11s —— View job


Claude finished reviewing PR #245


Tasks

  • Load tools and set up
  • Read changed R source files and latest commits
  • Check if prior review issues were fixed
  • Check for new issues (style, correctness, tests)
  • Compile and post review

Code Review: Model 2a PR #245 — after commits c3d2a85 and eaa5a4a

The two latest commits (c3d2a85 — CI/lint/tolerance fixes, eaa5a4a — additional lint fix) addressed several items from the June 20 review. The JAGS model math and factor parameterization remain sound. Below is the current status of every open item.


Still open — 3 items

1. compare_mod_2a.R:60-64dplyr::left_join() missing relationship argument (CLAUDE.md violation — blocking)

Unchanged from previous review. The .lintr.R file itself (lines 52–57) acknowledges this rule is enforced via code review rather than automatically, making this finding extra load-bearing.

# current (violates CLAUDE.md)
shared <- dplyr::left_join(
  s1, s2,
  by = c("biomarker", "param"),
  suffix = c("_ch1", "_2a")
)

# required
shared <- dplyr::left_join(
  s1, s2,
  by = dplyr::join_by(biomarker, param),
  suffix = c("_ch1", "_2a"),
  relationship = "one-to-one"
)

Fix this →


2. jags_data_2a.R:29-30 — Embedded newline in CLI warning (minor)

Still present. The string literal spans two lines with indentation whitespace that renders as a hard newline plus spaces in cli output:

# current — renders with a hard newline + spaces
"i" = "Found {n_bio} biomarkers; the shared factor will couple all of
       them (same-parameter, rank-1). For a clean pair, subset to two."

# fix — flatten to one line
"i" = "Found {n_bio} biomarkers; shared factor couples all (rank-1). Subset to 2 for a clean pair."

Fix this →


3. sim_case_data_2a.R:37,40 — Hardcoded n * 2 and for (k in 1:2) loop (robustness)

Still present. The biomarkers parameter accepts arbitrary-length input but the loop body only processes indices 1 and 2, silently dropping a third biomarker:

# line 37 — fix the pre-allocated size
rows <- vector("list", n * length(biomarkers))

# line 40 — fix the loop
for (k in seq_along(biomarkers)) {

Fix this →


Partially open — low priority

4. summarize_cross_2a.R:72-74 — Positional probs row indexing

The length check (length(probs) != 3L) prevents obvious mis-use, but cov_q[2, ] / cov_q[1, ] / cov_q[3, ] silently returns wrong results if probs = c(0.1, 0.9, 0.5) (which passes the length check but reorders the quantiles). Using cov_q["50%", ] / cov_q["2.5%", ] etc. would make this robust at zero extra cost, since apply(..., stats::quantile) returns named rows.


Fixed in the latest commits ✓

All items from the June 20 review that were reported as fixed remain fixed: @returns doc for run_mod_2a, CI test tolerance, stop()cli::cli_abort() across all four files, summarize_cross_2a() NULL guard, fit_chapter1_lean.R memory footprint, and the base::merge()dplyr::left_join() replacement.

The .lintr.R exclusion block for _2a-named files (lines 95–108) is a clean solution to the snake_case naming constraint.


JAGS model and math ✓

inst/extdata/model_2a.jags is unchanged and mathematically correct. Factor parameterization, identifiability constraint (lambda[1,p] T(0,)), and Chapter 1 nesting (all lambda = 0) are all sound.


Summary

3 blocking items remain open — items 1, 2, 3 above. Item 1 (missing relationship on left_join) is a CLAUDE.md hard requirement. Items 2 and 3 are robustness/style issues carried over from the June 13 and June 20 reviews. Item 4 is low-priority.

@Kwan-Jenny

Copy link
Copy Markdown
Collaborator Author

@claude Please fix the remaining open review comments 1–3 on branch chapter2-extend in one commit.

  1. In R/compare_mod_2a.R, lines 60–64, update the dplyr::left_join() call:
  • Replace by = c("biomarker", "param") with by = dplyr::join_by(biomarker, param).
  • Add relationship = "one-to-one" as an explicit argument to the join.

The corrected join should follow this structure:

shared <- dplyr::left_join(
s1,
s2,
by = dplyr::join_by(biomarker, param),
suffix = c("_ch1", "_aa"),
relationship = "one-to-one"
)
  1. In R/jags_data_2a.R, lines 29–30, flatten the "i" CLI warning message so the string is written on one line with no embedded newline or leading indentation whitespace. Keep the same message meaning, but make it a single-line string.

Use this single-line version:

"i" = "found {n_bio} biomarkers; shared factor couples all (rank-1). subset to 2 for a clean pair."
  1. In R/sim_case_data_2a.R, remove the hardcoded assumption that there are exactly 2 biomarkers:
  • Replace line 37 with:
rows <- vector("list", n * length(biomarkers))
  • Replace line 40 with:
for (k in seq_along(biomarkers)) {

Please only make these targeted changes for comments 1–3.

@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Claude finished @Kwan-Jenny's task in 1m 8s —— View job


Tasks

  • Read the three target files
  • Apply fix 1: compare_mod_2a.Rjoin_by + relationship = "one-to-one"
  • Apply fix 2: jags_data_2a.R — flatten CLI message to a single line
  • Apply fix 3: sim_case_data_2a.R — replace n * 2 and 1:2 with length(biomarkers) / seq_along(biomarkers)
  • Commit and push

Summary

All three targeted fixes applied and pushed as commit b3ba634 on chapter2-extend:

  1. R/compare_mod_2a.R:60-66dplyr::left_join() now uses by = dplyr::join_by(biomarker, param) and explicitly sets relationship = "one-to-one" per CLAUDE.md.
  2. R/jags_data_2a.R:29 — Multi-biomarker warning string flattened to one line; cli no longer renders an embedded newline plus indentation.
  3. R/sim_case_data_2a.R:37,40 — Pre-allocation is now n * length(biomarkers) and the inner loop is for (k in seq_along(biomarkers)), so the simulator honours the full biomarkers argument instead of silently dropping past the second.

Note on fix 1

I kept suffix = c("_ch1", "_2a") rather than the c("_ch1", "_aa") shown in the example snippet — the rest of compare_mod_2a() references mean_med_2a / var_med_2a columns (lines 65–68), so changing the suffix to _aa would break those lookups. Let me know if you actually want the suffix renamed and I'll update the downstream column references too.
· Branch

- compare_mod_2a(): use join_by() and set relationship = "one-to-one"
  on the shared-parameter left_join (CLAUDE.md requirement).
- jags_data_2a(): flatten the multi-biomarker warning to a single line
  so cli does not render an embedded newline plus indentation.
- sim_case_data_2a(): drop the hardcoded n * 2 pre-allocation and
  for (k in 1:2) loop so the simulator honours length(biomarkers).

Co-authored-by: Kwan-Jenny <Kwan-Jenny@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant