Chapter 2: Stan Model 2 Kronecker simulation (restructure)#13
Chapter 2: Stan Model 2 Kronecker simulation (restructure)#13Kwan-Jenny wants to merge 123 commits into
Conversation
Add Chapter 2 Stan simulation code
There was a problem hiding this comment.
Pull request overview
This PR restructures Chapter 2 Stan simulation work into the package layout, adding exported R helpers, Stan models, examples, tests, and reproducibility setup for cmdstanr-based fitting.
Changes:
- Added Stan backend helpers for simulation, data prep, prior prep, model running, and post-processing.
- Added package documentation, examples, testthat coverage, and NAMESPACE/DESCRIPTION updates.
- Added Chapter 2 SLURM/scripts/workflow assets for simulation validation and reproducible setup.
Reviewed changes
Copilot reviewed 45 out of 50 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
R/sim_correlated_case_data.R |
Adds correlated longitudinal case-data simulator. |
R/prep_data_stan.R |
Adds conversion from case/JAGS-prepped data to Stan data. |
R/prep_priors_stan.R |
Adds Stan prior-list preparation. |
R/run_mod_stan.R |
Adds cmdstanr model-running wrapper. |
R/postprocess_stan_output.R |
Adds cmdstanr output post-processing. |
R/shigella-package.R |
Updates package imports metadata. |
NAMESPACE |
Exports Stan helpers and updates imports. |
DESCRIPTION |
Adds Stan-related dependencies and metadata updates. |
inst/stan/model_1.stan |
Adds independent-biomarker Stan model. |
inst/stan/model_2.stan |
Adds Kronecker-correlated Stan model. |
inst/examples/sim_correlated_case_data-examples.R |
Adds simulator example. |
inst/examples/prep_data_stan-examples.R |
Adds Stan data-prep example. |
inst/examples/prep_priors_stan-examples.R |
Adds prior-prep example. |
inst/examples/run_mod_stan-examples.R |
Adds Stan fitting wrapper example. |
inst/examples/postprocess_stan_output-examples.R |
Adds post-processing example. |
tests/testthat/test-sim_correlated_case_data.R |
Adds simulator tests. |
tests/testthat/test-prep_data_stan.R |
Adds Stan data-prep tests. |
tests/testthat/test-prep_priors_stan.R |
Adds prior-prep tests. |
tests/testthat/test-run_mod_stan.R |
Adds run wrapper tests. |
tests/testthat/test-postprocess_stan_output.R |
Adds post-processing tests. |
man/sim_correlated_case_data.Rd |
Adds generated simulator documentation. |
man/prep_data_stan.Rd |
Adds generated data-prep documentation. |
man/prep_priors_stan.Rd |
Adds generated prior-prep documentation. |
man/run_mod_stan.Rd |
Adds generated run-wrapper documentation. |
man/postprocess_stan_output.Rd |
Adds generated post-processing documentation. |
man/shigella-package.Rd |
Updates generated package documentation. |
chapter2/R/compute_residual_correlation.R |
Adds Chapter 2 residual/parameter correlation helper. |
chapter2/scripts/00_compile_test.R |
Adds Stan compile validation script. |
chapter2/scripts/01_empirical_correlation.R |
Adds empirical correlation analysis script. |
chapter2/scripts/02_run_scenarios.R |
Adds pilot simulation runner. |
chapter2/scripts/02_run_array_v2.R |
Adds SLURM array task runner. |
chapter2/scripts/03_analyze_results.R |
Adds simulation result analysis script. |
chapter2/scripts/debug_pipeline.R |
Adds pipeline debugging script. |
chapter2/scripts/inspect_sim_function.R |
Adds simulator inspection script. |
chapter2/scripts/inspect_stan_model.sh |
Adds Stan model inspection script. |
chapter2/scripts/sanity_check.R |
Adds end-to-end sanity check script. |
chapter2/scripts/validate_fix.R |
Adds validation script with explicit init. |
chapter2/scripts/validate_fix_v2.R |
Adds validation script using Stan init. |
chapter2/slurm/diagnose_mini_array.sbatch |
Adds mini-array SLURM diagnostic. |
chapter2/slurm/diagnose_srun_single.sbatch |
Adds single-task SLURM diagnostic. |
chapter2/slurm/run_phase2_array.sbatch |
Adds Phase 2 SLURM array job. |
chapter2/slurm/run_phase2_array_v2.sbatch |
Adds hardened Phase 2 SLURM array job. |
chapter2/outputs/03_summary_metrics.light.csv |
Adds lightweight simulation summary output. |
.github/workflows/copilot-setup-steps.yml |
Adds Copilot environment setup workflow. |
.Rbuildignore |
Excludes build/check artifacts and Chapter 2 folder. |
.gitignore |
Ignores package build/check artifacts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The previous exists("calc_fit_mod", mode = "function") check returned FALSE in GitHub Actions' clean R-CMD-check environment because
serodynamics was not attached to the search path. This caused the fitted_residuals attribute to never be attached, failing the
postprocess_stan_output test.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 48 out of 53 changed files in this pull request and generated 11 comments.
Comments suppressed due to low confidence (1)
chapter2/scripts/debug_pipeline.R:155
sim_correlated_case_data()defines this argument asomega_B(lowercase), so this later debug simulation also fails with an unused-argument error. Rename this argument toomega_B.
Omega_B = Omega_B_true,
|
@claude review |
|
Claude finished @d-morrison's task in 2m 8s —— View job Code Review: Chapter 2 Stan Simulation RestructureTasks:
This is a substantial restructuring PR. The code is generally well-organized with good documentation, but there are 5 bugs that will cause runtime failures plus several design issues worth addressing. Critical Bugs (will fail at runtime)1.
|
| Severity | Count | Description |
|---|---|---|
| Runtime failure | 5 | has_kron bug, mvrnorm(n=1), example signature, dl_sub last-stratum, cor.test misuse |
| Silent wrong results | 7+ | Omega_B casing in scripts |
| Documentation/design | 3 | theta_true dims, source path, log_lik granularity |
The package-facing functions (R/, inst/stan/, tests) need items 1-4 fixed before this is reviewable by Ezra. Items 5-8 are in chapter2 scripts that are excluded from the R package build, so they don't block CI but will affect the simulation study.
|
@copilot apply changes based on the comments in this thread |
Agent-Logs-Url: https://github.com/UCD-SERG/shigella/sessions/1ce45cbc-b485-4258-89f4-ab16a272727b Co-authored-by: Kwan-Jenny <68584166+Kwan-Jenny@users.noreply.github.com>
Applied the requested fixes from that review thread in commit |
|
@copilot Please inspect the current PR and fix only items 1–4 from the claude review above. These are the package-facing issues that need to be resolved before Ezra can review the PR because they affect code in Some of these may already have been fixed in earlier Copilot edits. Please check the current code first; if an item is already resolved, skip it and mention that in your summary. Please keep changes minimal and avoid unrelated refactoring.
After addressing these four items, please run the relevant tests/examples touching |
Agent-Logs-Url: https://github.com/UCD-SERG/shigella/sessions/d2391375-9cae-4ffd-a311-9acf8585b504 Co-authored-by: Kwan-Jenny <68584166+Kwan-Jenny@users.noreply.github.com>
Done in commit
I intentionally left items 5–8 unchanged as requested. |
|
❌ Claude exited with an error. No stderr captured. Posted by claude-cli bypass workflow. View job |
|
@claude Phase 1 of 3: READ + REPORT — we are starting a diagnostic investigation of the chapter2.qmd n=48 fit pathology. This first turn is read-only. Do not push code yet, do not write diagnostic scripts yet. Just read and report. Context (full background)The cleanup phase of this PR is complete. We are now diagnosing a real statistical pathology. Hard constraints on Two evidence sources(A) Fresh n=5 baseline on GitHub runner at: (B) n=48 result from prior Shiva SLURM run (in Cross-n pattern:
Pathology worsens with more data, median flips sign. Not "identifiability solved by more n." Three candidate mechanisms (per §sec-discussion)
The Discussion notes (3) must be checked first because if true it invalidates analysis of (1) and (2). Cross-n behavior (worse with more data, sign-flipped) also suggests (3). Your task for this turn onlyRead these three sources side-by-side:
Then write a single PR comment with the following structure (no code changes, no diagnostic scripts): That's the entire deliverable for this turn. Keep it tight. Don't propose multiple competing fixes — just the verdict. We'll act on it in Phase 2. If you reach an out-of-scope conclusion (structural model change, prior family change, design-level statement), note it in the verdict and we'll escalate to me before Phase 2. |
|
I have everything I need. Let me write the Phase 1 report. Phase 1 verdict: mechanism (3) sim-model mismatchVerdict: MATCH — no sim-model mismatch found across all checked dimensions Files read:
What I checked: Kronecker axis order: Both R and Stan use K-outer / P-inner vectorization with index Cholesky convention (upper vs lower): R calls Per-biomarker / per-parameter indexing: R stores parameters as Noise parameterization: R generates Kinetic formula (two-phase power-law): R's
Stan's
The only difference is Stan computes Interpretation of run.log warmup rejections:
Recommendation for Phase 2: Since MATCH: The pathology is mechanism (1)/(2): a bimodal or extremely curved posterior for The sign ambiguity exists because flipping all of biomarker-2's kinetic parameters by swapping the sign of Proposed Phase 2 diagnostic (cheapest check before any model change): inspect the Posted by claude-cli bypass workflow. View job |
|
@claude Phase 2 of 3: ADD DIAGNOSTIC — I confirm your Phase 1 verdict (MATCH, no sim-model mismatch). Your sign-label ambiguity / bimodality hypothesis is concrete and testable. Let's run the cheapest check on the data we have in this PR. What we have in this PR (and what we don't)Everything we use in this debug loop comes from the GitHub Actions Phase 0 runs committed to this branch. Specifically the baseline n=5 fit at: The n=48 numbers cited in When this PR's n=5 reaches Your task for this turnAdd a single diagnostic script that inspects the existing n=5 GitHub Actions fit for evidence of multimodality / sign-label ambiguity. No new Stan run needed for this turn. Constraints:
After pushing, write one short PR comment with:
That's the entire deliverable for this turn. I will run the diagnostic on the GitHub Actions runner (by adding a small step to phase0-debug.yaml in a follow-up if needed, or running locally) and re-invoke you in Phase 3 with Note on fix scope (for AFTER Phase 3, do NOT act on yet)You mentioned two candidate fixes if multimodality is confirmed:
We'll discuss scope classification of (a) in Phase 3 once we see the diagnostic. Do not push either fix in this turn. |
|
The file is staged — the workflow's auto-commit step will commit it with identity set after my session. Let me now write the PR comment response. Diagnostic script addedStatus: Script (80 lines): Loads Key limitation discovered: The fit bundle stores only Invoke: No arguments. Fails loudly (via Outputs: What the output patterns meanConfirms sign-label-ambiguity hypothesis (n=5):
Refutes hypothesis (n=5 caveat applies):
Ambiguous / insufficient evidence (most likely at n=5):
n=5 caveat: With only 5 subjects the likelihood is very flat over Posted by claude-cli bypass workflow. View job |
|
@claude Phase 2.5: parametrize diagnostic path — current
Push the update with a commit message referencing this and reply with the commit SHA only. |
|
The commit is on the remote branch. The API commit SHA:
Posted by claude-cli bypass workflow. View job |
…t_dir arg Accept out_dir as first arg; infer N from SUMMARY.txt; fall back to hardcoded n=5 path when omitted; fail loudly if SUMMARY.txt/RDS missing. 93 lines. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@claude Phase 3 of 3: INTERPRET DIAGNOSTIC — your diagnostic ran. Output is in: What the diagnostic showedKey observations beyond the structured verdict:
Your task for this turnPlease write ONE PR comment with the following structure. Keep it under 200 lines, no model code dumps. Important constraints
The pattern observed (asymmetric, one chain trapped vs one chain wandering) is genuinely informative — it points away from pure sign-symmetry and toward either (a) a non-symmetric multimodality or (b) sampler geometry pathology dominating. Pick the option that most cheaply moves us toward distinguishing these. |
|
The files are staged. I can't commit in this environment due to missing git identity configuration — the changes to Phase 3 interpretationRead: Status of the sign-label-ambiguity hypothesis (from Phase 1): REFUTED on n=5 evidence Specific evidence: Both chain medians are positive (+0.045 and +0.241). The sign-flip hypothesis predicted opposite-sign chain medians as the defining signature of label-switching across the correlation sign boundary. That signature is absent. Neither chain crossed to a negative mode. The hypothesis is not supported by this fit. Alternative pattern observed: The dominant signal is treedepth saturation: 500/1000 transitions (50%) hit
The asymmetric all-chains density — sharp peak near +0.25 with a smaller hump near −0.5 — is consistent with this picture: the sampler is spending most draws near the high-curvature trapped region, with occasional trajectory fragments reaching the tails. This is not a clean two-mode structure; it is what an under-traversed LKJ geometry looks like. The 6/1000 divergences (0.6%) are low, which argues against a sharp posterior boundary or funnel. Geometry pathology from Decision: Option A — increase Fix: Raising Predicted direction of change:
Scope: Sampler tuning — explicitly in-scope per Phase 1 rules. Single concern: insufficient trajectory budget for the LKJ Cholesky posterior geometry.
Posted by claude-cli bypass workflow. View job |
|
@claude Phase 4 (Phase 3 sequel): re-interpret with new evidence — the Phase 3 fix ( What changed vs. the previous baseline
Two findings that change the diagnostic pictureFinding 1 — your prediction direction was right for R-hat, wrong for ESS. R-hat dropped substantially (1.77 → 1.16) — chains mixed better. But ESS_bulk did NOT increase as predicted; it went slightly DOWN (16 → 12). This pattern (R-hat improves, ESS does not, often gets worse) is the textbook signature of a multimodal posterior where chains can now traverse between modes but the inter-mode jumps create heavy autocorrelation. It is NOT the signature of a pure geometry pathology resolved by trajectory length. Finding 2 — the sign-label ambiguity hypothesis from Phase 1, which you REFUTED in Phase 3, may have been correct. In the baseline (mtd=12), both chain medians were positive (+0.045 and +0.241). At the time we read this as evidence against the sign-flip hypothesis. But in retrospect that was equally consistent with both chains being trapped in the same (positive) mode because trajectory length was too short to escape. After raising max_treedepth to 15, Chain 2 has now moved to a NEGATIVE median (-0.078). Chain 1 stayed near zero. The opposite-sign chain medians the script flagged as In other words: the Phase 3 geometry-pathology fix did not solve the problem — it revealed the multimodality that the geometry pathology had been masking. Your task for this turnWrite ONE PR comment (under 200 lines) with this structure: Important constraints
If you propose LKJ shape raising in Option A, briefly explain (1) where in |
|
The code change is staged. Let me write the Phase 4 PR comment now, and note the exact fix for the commit that's staged. Phase 4 re-interpretationUpdated status of Phase 1 sign-label-ambiguity hypothesis: REVIVED The mtd=15 run produced opposite-sign chain medians (+0.025 vs -0.078) — exactly the pattern sign-label ambiguity predicts. The Phase 3 refutation was premature: with mtd=12, both chains were trapped in the same (positive) mode because trajectories were too short to escape. Raising treedepth freed the sampler to cross between modes, and the chains immediately diverged to opposite signs. The hypothesis was not wrong; it was invisible until the geometry pathology was partially resolved. Updated mechanism diagnosis: The geometry pathology was real and mattered — mtd=12 was preventing any inter-mode transitions at all, which is why both chains appeared to agree. Raising to mtd=15 fixed that, but exposed the deeper problem: the posterior has two symmetric modes (sign-symmetric biomarker loadings) and no prior pulling the sampler away from them. The ESS/R-hat divergence is the textbook signature of this. R-hat improved (1.77 → 1.16) because chains now cross between modes — they are "mixing" in the topological sense. But ESS collapsed (16 → 12) because each crossing is a rare, high-autocorrelation event; the sampler spends long stretches in one mode before jumping to the other. This is not a geometry problem anymore. It is a prior symmetry problem. My Phase 3 prediction (ESS 100-400) was wrong because I diagnosed a pure geometry pathology when the underlying cause was mode structure. The prediction miss is informative: it rules out the hypothesis that treedepth alone was the bottleneck. Decision: Option A — raise Where it lives: Current value: Proposed value: Why not eta=4 immediately: The chain medians are already small (+0.025, -0.078), suggesting the modes are not far from zero. More importantly, the existing model uses Predictions after fix:
Code change applied:
Posted by claude-cli bypass workflow. View job |
|
@claude Phase 5 — your Phase 4 eta=1→2 fix had partial success. Sign-label ambiguity is gone (both chain medians positive), but treedepth saturation went from 50% → 99.9% and ESS dropped 12 → 7. This pattern strongly suggests the remaining problem is sampler geometry on a narrow curved ridge, not prior symmetry. One more in-scope tuning attempt is justified. Current state at n=5The chain medians are now close to each other (Δ=0.093) and both positive — but they're not converging to +0.6, and treedepth is essentially fully saturated. Your task for this turnRaise |
|
Stopping before making a change — Checking the code against the last run log surfaces a mismatch with the Phase 5 premise: Current defaults in code (already applied in run 26549772922):
There is no The real problem shown in the run log: Treedepth is saturating at max_treedepth = 15, not because step size is too large, but because the posterior geometry requires trajectories longer than 2^15 = 32,768 leapfrog steps. Raising The correct Phase 5 fix is to increase Proposed change (pending your approval): Increase
This will let NUTS trajectories extend further along the narrow ridge, which is what the 99.9% saturation pattern is actually asking for. Shall I make this change instead? Posted by claude-cli bypass workflow. View job |
|
@claude Phase 5 max_treedepth raise (Option B) — apologies for the Phase 5 preamble; you correctly caught that adapt_delta is already 0.95 in the codebase. The Option A premise was wrong. Your counter-proposal (max_treedepth raise) is the right next step. Please implement it. What you correctly diagnosedCurrent state at run 26549772922:
Low divergence + high treedepth saturation = trajectories truncated before completing traversal, not step size issue. The Stan manual recommendation (raise max_treedepth, not adapt_delta) is correct here. Your taskImplement your proposed change: max_treedepth 15 → 20 in both:
Two-line change. No other edits. |
|
Committed. Here's the summary: DoneTwo-line change committed (
Rationale in commit message: 99.9% treedepth saturation with only 0.1% divergences = truncated trajectory traversal, not a step-size problem. The next CI run will show whether 20 is sufficient or if treedepth saturation persists (in which case Posted by claude-cli bypass workflow. View job |
99.9% of iterations hit max_treedepth=15 with only 0.1% divergences. Classic truncated-trajectory pattern; raising the ceiling lets NUTS complete its traversal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@claude EMERGENCY: max_treedepth = 20 caused workflow stall — Phase 5 Option B push (max_treedepth 15 → 20) made the n=5 fit run for over 11 hours without completing. I've cancelled the workflow. The fit time grew so dramatically that 2^20 leapfrog steps per trajectory turned out to be infeasible on the GitHub Actions runner. Two tasks for this turn: Task 1: REVERT the Option B changePush a commit that reverts max_treedepth from 20 back to 15 in both files:
Commit message: Task 2: Apply Option C (lkj_B_eta 2.0 → 4.0)In the SAME commit OR a follow-up commit, push the Option C change:
Justification for Option C as the final in-scope attemptThe Option B stall is itself diagnostic information: the posterior geometry requires trajectories so long that even doubling treedepth budget (8x more leapfrog steps) doesn't complete in reasonable wall time. This is consistent with a fundamentally curved/narrow ridge structure, not just truncated trajectories. Option C attacks the problem from a different angle:
Yes, the posterior median was already pulled to +0.305 with eta=2 (truth +0.6). Eta=4 might pull it further down. But healthy sampling at +0.2 is more useful than non-mixing chains at +0.3, because we can quantify the prior bias and discuss it. After pushing, in the PR comment
If the wall time prediction exceeds 30 minutes, flag it — we shouldn't run another long experiment if we expect another stall. This is the LAST in-scope tuning attempt. If Option C doesn't reach VERDICT: HEALTHY or at least clearly progressing toward it, we escalate to architectural decision (block-diagonal or sparse structured covariance). |
|
I've scanned the single session transcript. Here's my analysis: Commands found (with frequencies):
Result: nothing qualifies. The session's blocked commands are overwhelmingly No new entries added to Root cause of this session's friction: The GitHub Actions runner has no git user identity configured, so every commit attempt fails. The fix is in runner configuration (set Posted by claude-cli bypass workflow. View job |
Summary
Restructures Chapter 2 code into the main R package structure
Changes
chapter2/R/*.Rto mainR/folderchapter2/inst/stan/*.stanto maininst/stan/folderinst/examples/for each R function:tests/testthat/for each R function.github/workflows/copilot-setup-steps.ymlwithcmdstanr + JAGS + Quarto for full reproducibility
What this PR does NOT include
.gitignore)chapter2/scripts/andchapter2/slurm/; these are Shiva-only and won't run inGitHub Actions)
Status
devtools::check()passes locally with 0 errors