chapter1-repro: reproducible Chapter 1 — manuscript (chapter1.qmd -> docx/pdf) + main/supplement figure-table QMDs, all built from library(shigella)#28
Conversation
|
@claude review |
|
Now I have a complete picture. Here is my review: PR Review:
|
|
@claude Could you review the latest commits and let me know whether the PR has improved since the previous your review, especially regarding readability, refactoring, and package organization? Please also check whether the implementation matches the PR description and identify any remaining issues before I ask for another human review. |
|
Now I have a complete picture. Here is my assessment: Review update:
|
|
@claude Fix Two CI jobs are still red on
Already landed — do not redo:
A.
|
| Replace | With | Files |
|---|---|---|
message() |
cli::cli_inform() |
build_sensitivity_results.R (2×), get_mae.R |
stop() |
cli::cli_abort() |
get_observed.R (4×), load_fit_obj.R |
warning() |
cli::cli_warn() |
get_observed.R, get_prediction_summary.R |
The .data / .env object_usage warnings are resolved by step B.
Re-run lintr::lint_package(). For anything still flagged:
- lines still
>80chars that.lintrdoesn't allow → runstyler::style_pkg()(confirm the diff is whitespace-only), wrap the line, or as a last resort append# nolint: line_length_linter. figure_s3_additional_comparison(object_length_linter, 31 chars) → append# nolint: object_length_linter.rather than renaming, to avoid touching its call sites in the supplement QMD.- any
object_name_linterhit → rename the local variable to snake_case.
G. chapter1.qmd — make it consistent with the other two QMDs
In vignettes/manuscript/chapter1.qmd:
-
Replace the
source()-based setup chunk:r_dir <- "~/chapter1-repro/R" invisible(lapply(list.files(r_dir, pattern = "[.][Rr]$", full.names = TRUE), source))
with
library(shigella)(same aschapter1_main_figures_tables.qmd/chapter1_supplement_figures_tables.qmd). -
Remove the three hardcoded
"~/chapter1-repro/path/to/..."placeholder paths; use the samerda_dir+ raw-Excel CONFIG convention the other two QMDs use.
H. Minor review items
posterior_pred.R: change defaultsdata = NA, raw_dat = NA→NULL, and update the corresponding checks (is.na()→is.null()).compute_residual_metrics.R: whenscale = "log"dropsn_nonposnon-positive observations, emit acli::cli_warn()stating how many were dropped, instead of dropping silently.
Don't
Do not reimplement the serodynamics::: calls (ab, use_att_names, get_timeindays_var). Once serodynamics is in Imports (step A), they downgrade from the WARNING to a NOTE, which is acceptable. Leave them for a separate PR.
When done
Run devtools::check() and lintr::lint_package() and paste both summaries.
Target: R CMD check 0 errors / 0 warnings; lintr clean. Confirm the three manuscript QMDs are still in the repo (just build-ignored).
|
❌ Claude exited with an error. Partial stdout (last 50000 bytes): No stderr captured. Posted by claude-cli bypass workflow. View job |
|
@claude Batch 1 of 2 — Fix On Already landed — do not redo:
A.
|
|
❌ Claude exited with an error. Partial stdout (last 50000 bytes): No stderr captured. Posted by claude-cli bypass workflow. View job |
|
@claude review |
|
Now I have enough to write a thorough review. Code Review: PR #28 —
|
| File | String used |
|---|---|
model_aesthetics.R, build_figure4_row.R, model_comparison_legend.R, figure_s4_mae_slopegraph.R |
"Sero-specific" |
select_best_models.R, table2_kinetic_params.R, table4_mae.R, plot_model_comparison_forest.R |
"Serotype-specific" |
For example, select_best_models() assigns "Serotype-specific" in the best_model column, which then drives table2_kinetic_params.R and table4_mae.R. But build_figure4_row.R maps "Sero-specific" to .model_colours[["Sero-specific"]]. If select_best_models output ever flows into a colour lookup, the sero-specific colour will be NA silently.
3. Calls to serodynamics::: internal API
Five call sites use triple-colon (:::) to access private serodynamics functions:
serodynamics:::ab(...) # pop_mean_curve.R, predict_posterior_at_times.R, posterior_pred.R, get_prediction_summary.R
serodynamics:::use_att_names() # posterior_pred.R
serodynamics:::get_timeindays_var() # compute_residual_metrics.RThese bypass the public API and will break silently if serodynamics is updated. Since serodynamics is an in-house package, the right fix is to either export ab() / use_att_names() / get_timeindays_var() from serodynamics, or copy the relevant logic.
Medium
4. Placeholder paths in chapter1.qmd are not just approximate — they contain path/to/
raw_compiled <- "~/chapter1-repro/path/to/3.8.2024 Compiled Shigella datav2.xlsx"
raw_metadata <- "~/chapter1-repro/path/to/Additional metadata..."
raw_durdia <- "~/chapter1-repro/path/to/Duration of diarrhea..."The path/to/ segment is a literal unfilled placeholder. Anyone trying to reproduce the manuscript will get file.exists() returning FALSE and silently skip Table 1 and Figure 1 — but if they set rda_dir and forget to update these, the failure mode is silent (the flags gate those sections rather than erroring). Consider replacing with stop("Set raw_compiled to the path of ...") or at minimum noting in the config block that these three paths must be filled in.
5. pop_mean_curve() uses tidyr::crossing() on full posterior draws
tidyr::pivot_wider(...) |>
tidyr::crossing(t = t_grid) # cross-join: n_draws × 43 time pointscrossing() performs a full Cartesian product. With 10M MCMC iterations (noted in plot_model_comparison_forest annotations) this would produce hundreds of millions of rows. Even after thinning, this approach deserves a note about expected draw count, or a slice_sample() guard before the cross-join.
6. O(n²) loop in .ppc_simulate()
for (j in seq_len(n_sim)) {
...
sim_tab <- dplyr::bind_rows(sim_tab, one) # grows by copy each iteration
}bind_rows inside a loop reallocates on every iteration. Replace with purrr::map_dfr(seq_len(n_sim), function(j) {...}) to collect into a list first.
Minor
7. parse_sensitivity_filename() will stop on antigen names containing _
parts <- strsplit(nm, "_")[[1]]
if (length(parts) != 4) stop("Unexpected file name format: ...")If any sensitivity file uses an antigen like sf2a_osp (as seen in figure_s4_mae_slopegraph.R's dplyr::recode call where "sf2a_osp IgA" is recoded), this function will stop with an opaque error. Consider splitting on only the last 3 underscores, or switching to a regex like ^sensitivity_(.+)_(IgG|IgA)_(primary|diffuse|informative)_(overall|serotype)\\.rda$.
8. posterior_pred() default data = NA is misleading
posterior_pred <- function(data = NA, raw_dat = NA, ...)NA as a default for a required model object produces confusing downstream errors (e.g. Error in attr(NA, "population_params")). Use no default and let R's "argument is missing" message surface, or add an explicit if (identical(data, NA)) stop("'data' must be a fitted sr_model").
9. Magic number in build_table1_data.R
followup_days = max(.data[["Actual day"]], na.rm = TRUE) + 2The + 2 has no explanation. This presumably accounts for a lead-in or observation window — it warrants a brief comment.
10. compute_residual_metrics() silently drops non-positive observations on log scale
n_nonpos <- sum(residual_data$obs <= 0) + sum(residual_data$pred_med <= 0)
if (n_nonpos > 0) {
residual_data <- dplyr::filter(residual_data, .data$obs > 0, .data$pred_med > 0)
}Non-positive values are silently removed. In a reproducibility context, add a message() or cli::cli_warn() noting how many rows were dropped, so this doesn't go unnoticed.
Informational
11. Vignette sources R files directly rather than calling library(shigella)
invisible(lapply(list.files(r_dir, ...), source))The vignette exercises the raw source files, not the installed package. This means it won't catch packaging issues (undeclared imports, unexported symbols). Once the DESCRIPTION Imports: issue is resolved, consider switching to library(shigella) and removing the r_dir path config, making the vignette a proper package vignette.
12. reattach_case_attrs.R file name doesn't match function prefix convention
The file is named reattach_case_attrs.R but the function is .reattach_case_attrs() (prefixed dot = internal). The other internal helpers (add_kinetic_headers.R → .add_kinetic_headers()) follow the same pattern. This is consistent within the PR but differs from the file-name-matches-function convention used for exported functions. Not blocking, just worth noting for future contributors.
Posted by claude-cli bypass workflow. View job
|
@claude Batch 1 of 4 —
Already landed — do not redo:
A.
|
|
All 7 files staged, no untracked junk. The workflow's auto-commit sweep will land these cleanly. Batch 1 done — what was changedA.
|
| File | Change |
|---|---|
R/shigella-package.R |
#' @importFrom rlang .data .env added; NAMESPACE updated with importFrom(rlang,.data) / importFrom(rlang,.env) |
R/prep_heatmap_data.R |
utils::globalVariables(c(":=", ".", "sid", "unq_id", "isotype", "study_name", "serotype", "cohort_name", "pid", "age", "row_id", "display_label", "antigen", "antigen_clean", "timepoint", "timepoint_label")) at file top — covers all data.table NSE bare names |
R/figure_crossreactivity_heatmap.R |
utils::globalVariables("isotype") at file top — covers shig_bg[isotype == "IgG"] data.table subset |
R/table_s1_priors.R |
rows = Level == "..." → rows = .data$Level == "..." in all three gt::tab_row_group() calls (gt uses eval_tidy() so .data works) |
Note on commit: The git identity wasn't pre-configured in this runner (the workflow sets it in a post-Claude step). The changes are fully staged; the workflow's auto-commit sweep will land them on
chapter1-repro.
Posted by claude-cli bypass workflow. View job
|
@claude Batch 2 of 4 — Roxygen
C. Roxygen
|
|
The workflow's allowed-tools list doesn't include Here is the summary of all changes made: Batch 2 complete — 24 files changed (12 R sources + 12 Rd files)What was done
Cross-package fix:
No change needed:
Note on
|
|
@claude Batch 3 of 4 — Vignette build-ignore +
D. Manuscript vignettes — clears the files in vignettes WARNING and the no vignettes NOTEThe usethis::use_build_ignore("vignettes/manuscript")Verify E.
|
|
All four changes are staged. R is not installed in this CI environment so I can't run Batch 3 — changes committedD —
|
|
@claude Batch 4 of 4 — lintr job + remaining code-review fixes
F. lintr job — make the lint workflow pass
H1. Fix the label mismatch (HIGH — a real bug, not style)The serotype-specific model is labelled two different ways across files, so colour lookups can silently return
Pick one canonical label and use it everywhere (recommend H2. Remaining code-review items
Performance notes (optional — only if quick and behavior-preserving)
Don'tDo not reimplement the When doneRun |
|
I can see the workflow handles commits after I exit. The "Push Claude's commits to PR branch" step will auto-commit any staged/unstaged changes ( Here is the final report: DoneA.
|
|
@claude review |
|
Now I have enough context to write a comprehensive review. PR Review:
|
|
@claude Next batch —
Item A (FIX) — hardcoded
|
|
All three changes are staged and will be auto-committed by the workflow. Here's the summary: DoneItem A —
|
|
@claude Audit hardcoded constants only; do not edit
Do not modify any files. Before starting, run git status --short. If the working tree is not clean, stop and report. TaskSearch the repository for hardcoded numeric/string constants that could affect analysis, tables, figure labels, paths, or data interpretation. Focus on:
Do not focus on generated files, .Rd files, .Rproj.user, .git, or old logs. Look especially for:
For each finding, classify it into one of these categories:
Report formatFor each finding, report:
Important constraintsDo not edit anything. Then stop. |
|
I have enough to write a complete, verified audit report. Hardcoded Constants Audit —
|
| Symbol | Category |
|---|---|
| 🔴 | Must fix before merge |
| 🟡 | Needs human decision |
| 🟢 | Already handled or documented |
| ⚪ | Harmless style/plot constant |
1. Hardcoded sample sizes in labels / figure text
| # | File | Line | Value | Risk | Action |
|---|---|---|---|---|---|
| 1 | R/figure_individual_comparison.R |
24 | "A) S. flexneri 2a (n = 17): 2-way comparison" |
🔴 Silently wrong if data subset changes; figure_population_trajectories.R uses sprintf with computed n, so the pattern is inconsistent. |
Compute n from n_distinct(datasets$dL_clean_sf2a$id) |
| 2 | R/figure_individual_comparison.R |
29 | "B) S. sonnei (n = 11): 2-way comparison" |
🔴 Same risk. | Compute from data |
| 3 | R/figure_individual_comparison.R |
34 | "C) S. flexneri 3a (n = 8; combined n = 25): 3-way comparison" |
🔴 Two counts, both hardcoded; "n = 25" requires summing Sf2a + Sf3a, so it's doubly fragile. | Compute from data |
| 4 | R/figure_ipab_with_age.R |
8, 13, 23, 27 | "Overall (n=48)" used as both group label key and factor level |
🔴 If total N changes the factor key and all downstream color/linetype look-ups break silently (they're string-matched). | Compute total N and interpolate into the string |
| 5 | R/plot_model_comparison_forest.R |
137–138 | "Blue = overall (n=48); Red = serotype-specific; Green = combined S. flexneri (n=25, where applicable)" |
🔴 Subtitle text; n=25 is the Sf2a+Sf3a sum — wrong if either serotype grows or shrinks. | Compute from model inputs |
| 6 | R/table1_study_population.R |
57–58 | "\"Other\" serotypes include S. boydii (n=2), S. dysenteriae (n=1), S. flexneri 1c (n=1), untypeable S. flexneri (n=2), and RLDT-positive Shigella without culture confirmation (n=1)." and "Not all 48 participants completed full follow-up" |
🔴 Footer is static text in an otherwise generated table; every sub-count will silently mismatch if data changes. | Compute "Other" composition from data; replace 48 with computed n |
2. Hardcoded summary statistics in manuscript prose
These are in vignettes/manuscript/chapter1.qmd. The Introduction and Abstract mentions of 48 and 200 days refer to fixed study-design values, but the Results section (lines 401–407) contains data-derived statistics that will silently diverge from the generated tables if data ever changes.
| # | File | Line | Value | Risk | Action |
|---|---|---|---|---|---|
| 7 | vignettes/manuscript/chapter1.qmd |
401 | "48 individuals … 218 longitudinal serum samples … 200 days" |
🔴 218 is a count of rows — data-derived. 48 is here used as a count of participants; if data is ever subsetted differently, prose and Table 1 diverge. |
Inline R: `r nrow(...)` |
| 8 | chapter1.qmd |
402 | "median age of 10.5 years (IQR: 2.0–25.0), with 44% under five … 50%, 24/48 male" |
🔴 All data-derived summary stats typed as prose. Every number can diverge from the generated table. | Replace with inline R expressions |
| 9 | chapter1.qmd |
403 | "17/48, 35% … 11/48, 23% … 8/48, 17% … 7/48, 15% … 5/48, 10%" |
🔴 Per-serotype fractions are data-derived. The n=25 combined-flexneri count also appears here. |
Replace with inline R expressions |
| 10 | chapter1.qmd |
405 | "67% … 88% had four or more samples" |
🔴 Percentages computed from follow-up data. | Replace with inline R |
| 11 | chapter1.qmd |
406 | "Median follow-up was 182.0 days (IQR: 30.0–206.0)" |
🔴 Data-derived median/IQR. | Replace with inline R |
| 12 | chapter1.qmd |
407 | "Forty participants (80%) completed full follow-up … the 8 participants lost before day 90" |
🔴 Data-derived counts. 80% also depends on completed_followup threshold (>= 4 visits & >= 90 days), so if that threshold ever changes the prose description is additionally stale. |
Replace with inline R |
| 13 | chapter1.qmd |
411–423 (tbl-cohort caption) | "\"Other\" serotypes include … (n=2), (n=1), (n=1), (n=2), (n=1)" |
🔴 Same as finding #6 — duplicated from the table footer and doubly at risk. | Derive from data in caption code |
3. Hardcoded study design / protocol constants
| # | File | Line | Value | Risk | Action |
|---|---|---|---|---|---|
| 14 | data-raw/_config.R |
52 | symptom_onset_offset_days <- 2L |
🟢 Named constant, used via parameter default in build_table1_data.R:90. Table 1 footer at line 56 also explicitly states "max(Actual day) + 2". Well-documented. |
Leave as-is |
| 15 | R/build_table1_data.R |
54 | age < 5 ~ "<5", age >= 5 ~ "≥5" |
🟢 Standard epidemiological under-5 cutoff. The cutoff is also the title of the column — changing it would require intentional redesign. | Leave as-is |
| 16 | R/subset_age_group.R |
20, 29, 31 | age == 5 exclusion from both strata |
🟢 Documented explicitly in function docstring: "Both groups exclude age == 5." An n_boundary informational message is emitted at runtime. |
Leave as-is |
| 17 | R/build_table1_data.R |
109 | n_visits >= 4 & followup_days >= 90 (completed follow-up definition) |
🟡 The threshold defines the completed_followup flag used in Table 1 and is cited as prose ("defined as ≥ 4 visits and ≥ 90 days" at line 407). The two are currently consistent but maintained in separate places. |
Human decision: accept duplication, or extract as a named constant in _config.R and reference from both build_table1_data.R and the manuscript |
4. Hardcoded sentinel values
| # | File | Line | Value | Risk | Action |
|---|---|---|---|---|---|
| 18 | R/build_table1_data.R |
26 | dplyr::na_if(88) |
🟢 The comment immediately after the call reads "88 = 'unknown' sentinel in the metadata workbook; kept as NA so gtsummary excludes it." Purpose is documented and the value will not change unless the upstream Excel codebook changes. | Leave as-is |
5. Hardcoded paths
| # | File | Line | Value | Risk | Action |
|---|---|---|---|---|---|
| 19 | data-raw/_config.R |
11–12 | raw_data_dir <- "" / manuscript_data_dir <- "" |
🟢 Intentional placeholders; the script immediately aborts with a clear error if left empty. | Leave as-is |
| 20 | vignettes/manuscript/chapter1.qmd |
96, 99, 101, 103 | "~/Data/Manuscript", "~/path/to/3.8.2024 Compiled Shigella datav2.xlsx", etc. |
🟡 ~/ is non-portable and the Excel filenames contain a date (3.8.2024) which will be confusing if the file is ever re-exported with a different name. The block is clearly marked ## ---- EDIT THESE ----, so the intent is understood. |
Human decision: consider an environment-variable fallback or a shared config sourced by the qmd; at minimum, REPRODUCIBILITY.md could document what each path must point to |
| 21 | vignettes/manuscript/chapter1_main_figures_tables.qmd |
35, 38, 40, 42 | Same ~/Data/Manuscript pattern |
🟡 Same as above — separate file, same issue. | Same recommendation |
| 22 | vignettes/manuscript/chapter1_supplement_figures_tables.qmd |
34 | rda_dir <- "~/Data/Manuscript" |
🟡 Same. | Same |
6. Time-grid constants
| # | File | Line | Value | Risk | Action |
|---|---|---|---|---|---|
| 23 | R/figure_ipab_with_age.R |
102–103 | t_grid = seq(0, 210, by = 5), xlim = c(0, 210) |
🟡 Upper bound is 210, but the study states "up to 200 days." The 10-day overhang is likely intentional (curves extend slightly beyond observed data), but it is inconsistent with build_figure4_row (which uses 200) and with the stated follow-up window. |
Human decision: if 210 is intentional for extrapolation clarity, document it; otherwise align with 200 |
| 24 | R/build_figure4_row.R |
117 | xlim = c(0, 200), times = seq(0, 200, by = 1) |
🟡 Uses 200, creating a visible inconsistency with finding #23 (Fig 5A goes to day 210; Fig 4 rows go to day 200). | Same — human decision on canonical follow-up upper bound |
| 25 | R/pop_mean_curve.R |
12 | t_grid = seq(0, 210, by = 5) |
🟡 Consistent with figure_ipab_with_age.R, but see #23. |
Same |
| 26 | R/figure_with_individuals.R |
11 | t_grid = seq(0, 210, by = 5) |
🟡 Same as #25. | Same |
7. Sensitivity analysis threshold
| # | File | Line | Value | Risk | Action |
|---|---|---|---|---|---|
| 27 | R/table_s3_sensitivity.R |
16–17 | delta_mae < -0.1 ~ "Serotype better" / delta_mae > 0.1 ~ "Overall better" |
🟡 The ±0.1 MAE bin boundary is undocumented. It affects which cells display "Serotype better" vs "Overall better" and could be interpreted as an arbitrary decision without a methods note. | Human decision: add a source-note in the table or methods section explaining the ±0.1 threshold |
8. Duplicated constants that could drift from their source
| # | File | Line | Value | Risk | Action |
|---|---|---|---|---|---|
| 28 | R/plot_model_comparison_forest.R |
144 | "MCMC: 10,000,000 iterations, 50,000 burn-in, 4 chains per model" |
🟡 Hard-codes mcmc_main values from _config.R as a caption string. If MCMC settings are ever tuned, caption will silently lie. |
Consider passing mcmc settings as a parameter to the plot function |
| 29 | R/table_s3_sensitivity.R |
67 | "diffuse (2× SD), and informative (0.5× SD)" |
🟢 The scaling factor is correct (matches _config.R:89-93: prec / 4 = wider ≈ 2× SD). Text matches the code. |
Leave as-is, but watch if sensitivity configs change |
9. Harmless constants (no action needed)
| # | File | Location | Value | Note |
|---|---|---|---|---|
| 30 | R/add_natural_scale.R |
21 | 365 (days/year) |
Physical constant; correct. |
| 31 | R/adjust_timeindays_durdia.R |
59 | 24 (hours/day) |
Physical constant; correct. |
| 32 | R/build_table1_data.R |
72 | \u22654 / <4 factor levels |
Display strings; not data-derived counts. |
| 33 | data-raw/_config.R |
57–96 | All MCMC iterations, priors | Correctly isolated in config file. |
| 34 | All figure files | Various | Alpha transparency (0.08, 0.10, 0.18, 0.12) |
Pure visual styling. |
| 35 | R/table4_mae.R |
39–40 | Quartile bounds 0.25, 0.75 |
Standard IQR; correct. |
Priority summary
Fix before merge (findings 1–13, 28): Eight items involve hardcoded sample sizes in R figure/table labels (findings 1–6) and fourteen lines of data-derived prose in the Results section (findings 7–13). These will silently mismatch the generated output if data changes. Finding 28 (MCMC caption) is lower urgency but the same class of problem.
Human decision needed (findings 17, 20–24, 27): Follow-up completion threshold, placeholder path strategy, time-grid upper bound (200 vs 210 days), and the ±0.1 MAE bin cutoff. None are correctness bugs in the current data, but each needs an intentional decision documented somewhere.
Already handled or harmless (findings 14–16, 18–19, 25–26, 29–35): No action needed.
Posted by claude-cli bypass workflow. View job
|
@claude Batch 1 — derive n in
Before editing, run
Verified expected values (the derived numbers MUST equal these)
These were confirmed by: What to change (
|
|
The change is staged. The workflow's post-run step configures git identity and auto-commits any staged changes. Done —
|
| Panel | Dataset | Column | Expected |
|---|---|---|---|
| A | dL_clean_sf2a |
id |
17 |
| B | dL_clean_sonnei |
id |
11 |
| C | dL_clean_sf3a |
id |
8 |
| combined | dL_combined_sf3a |
id |
25 |
3. Hardcoded label integers — none remain
grep -n "n *= *[0-9]\|n=[0-9]" R/figure_individual_comparison.R
(no output)
4. Only one file changed
R/figure_individual_comparison.R — 7 insertions, 3 deletions.
Note on commit: The git runner environment has no user identity configured when Claude executes (identity is set in a post-run workflow step). The change is staged; the workflow's "Push Claude's commits" step will sweep it into an auto-commit before pushing.
Posted by claude-cli bypass workflow. View job
|
@claude Batch 2 —
Environment reality: the dataset is NOT in the repo and R is not available to you here. You cannot run the code or verify the number. That is expected — write the code correctly and leave verification to a local run. Do not claim you verified any value. Do not refuse just because you can't verify. Before editing, run
The risk this fix addresses
What to do (
|
Verification report1. ID column used
2. Single variable for the "Overall" key/label
All four sites derive from the same computed variable; there is no separate literal. 3. Remaining
|
| Check | Expected | If wrong |
|---|---|---|
| Legend label for the overall series | Overall (n=48) |
If you see a number much larger than 48 (e.g. ~1017), raw_overall is not filtered to SOSAR cases — fix the upstream data step, not this label |
| All three series have colour/linetype (no grey/NA) | Each group renders in its assigned colour | Would mean the computed label doesn't match a key — check n_distinct(raw_overall[["id"]]) at the console |
<5 years / >=5 years subgroup series unchanged |
Both still render with their existing colours and dashed linetype | No logic in those two groups was touched |
Posted by claude-cli bypass workflow. View job
|
@claude Revert unrelated .gitignore change from previous batch
Before editing, run git status --short. Do only this:
After reverting, confirm:
Then stop. |
|
The temp file ResultChange made: Removed Verification:
Note on commit: Posted by claude-cli bypass workflow. View job |
|
@claude review |
|
Now I have enough to write a thorough review. Code Review — PR #28:
|
|
@claude Pre-review batch — Table 2 footnote newline (A)
Before editing, run
A. Embedded newline in Table 2 footnote (
|
|
The Done: A — Table 2 footnote newlineWhat changed: - "population mean (mu.par) of each parameter under the best-performing
- model."
+ "population mean (mu.par) of each parameter under the best-performing model."Scan result: No other mid-string newlines found in the caption or footnote strings in this file. The caption ( Confirmation: (a) ✅ The Table 2 footnote is now one line — no embedded newline in the quoted string.
Posted by claude-cli bypass workflow. View job |
|
@d-morrison — this branch is ready for your review. As I mentioned on Teams, could you start with the math/stat sections of Below is a status summary so you know what's settled and what's intentionally left for later. What this PR doesRestructures the Chapter 1 analysis into a reproducible package: 55 one-public-function-per-file R functions, Known items intentionally left for a follow-up (not blocking this review)1. 2. Values that can't be verified in CI (no data in the repo). The raw datasets aren't committed to GitHub, so data-derived counts can't be confirmed by CI or the automated reviewer — these are confirmed locally on Mercury via a real render. Flagging that CI cannot check them. 3. Footer/legend sample sizes left as literals (low risk). A couple of explanatory caption strings — the Table 1 footer composition and the forest-plot legend subtitle — still carry literal 4. No unit tests yet. The package currently has no test suite. This was a behavior-preserving refactor (verified by before/after comparison of the generated outputs). Plan: add smoke tests for the data-free pure helpers (and a 5. Minor design notes from the automated review (tie-breaking in Thanks for taking a look — starting with the stats sections would be a big help. |
What this PR does
This branch makes Chapter 1 fully reproducible and adds the manuscript itself.
The whole analysis flows from one audited, packaged function set:
Every step is driven by small, single-purpose functions in
R/(loaded withlibrary(shigella)) and thin generator scripts indata-raw/. Three Quartodocuments under
vignettes/manuscript/consume those same packaged functions, so thepaper and the figure/table checks always agree.
Not in this PR (by design): no raw data (lives on SharePoint), no fitted
.rda(those are outputs, regenerated locally and git-ignored).
Three ways to review (everything in one PR)
vignettes/manuscript/chapter1.qmdvignettes/manuscript/chapter1_main_figures_tables.qmdvignettes/manuscript/chapter1_supplement_figures_tables.qmdAll three documents call
library(shigella)and therefore use the samepackaged functions — a number that appears in the paper is the same number you see in
the standalone figure/table check, because there is no second, divergent copy of the
plotting/table code.
The manuscript files
chapter1.qmd— the full manuscript "Antibody Kinetics Following ConfirmedShigella Infection: A Longitudinal Study in Dhaka, Bangladesh" (title, authors,
abstract, methods, results, discussion, supporting information). Renders to
docx and pdf. Its tables and figures are generated by the packaged
functions (e.g.
table1_study_population(),figure_crossreactivity_heatmap(),table2_kinetic_params(),figure_individual_comparison(), …) via labelledchunks (
#| label: tbl-*/fig-*), so cross-references resolve automatically.references.bib— 39 references (35 manuscript citations + 4 methodologyreferences); all in-text citations use
[@key]syntax.plos-neglected-tropical-diseases.csl— PLOS NTD citation style (referencedby the
csl:line inchapter1.qmd).Design principles (reproducibility)
data-raw/_config.R..rdaare outputs, not inputs. They are regenerated bydata-raw/and git-ignored; the package/CI never depend on them.
chapter2-stan-simulationbranch. Tightly-coupled internal helpers are co-locatedin the same file under a
-- Internal helpers --divider and marked#' @keywords internal+#' @noRd; the few helpers shared across functions gettheir own file and are flagged at call sites with
# nolint: object_usage_linter.library(shigella)), not via ad-hocsource()of individual files — so the documents load the whole, audited functionset the way the package will ship it.
serodynamics::initsfunction()'s fixedper-chain seeds, so identical input data reproduces identical JAGS output.
How I worked on the files
at each data-prep / rendering seam; pulled inner closures up to named helpers).
mirroring
chapter2-stan-simulation.the code (ignoring comments/blank lines) is an identical line-for-line multiset
before vs. after, that all 89 top-level definitions appear exactly once across
the function files, and that every file is bracket-balanced.
library(shigella). All numeric citationsfrom the source manuscript were converted to
[@key]syntax; the three QMDs sharean identical
library(shigella)setup, so they all use the packaged functions.Folder & file structure
How to reproduce (run order)
devtools::load_all()(dev) ordevtools::install()thenlibrary(shigella). Requiresserodynamics(UCD-SERG), JAGS, and Stan.
data-raw/_config.R(raw_data_dir,manuscript_data_dir).data-raw/00_build_case_data.R(itsource()s_config.Rfor the shared paths/settings, then uses the packaged functions).data-raw/01..05_*.R-> writes the.rda(overall,serotype-specific, combined-flexneri, age-stratified, and the sensitivity fits).
(This is the slow MCMC step.)
rda_dirin its CONFIGchunk (where the
.rdalive); each document then doeslibrary(shigella)+load()the.rda:chapter1.qmd-> the full manuscript as docx (for scientific review), orchapter1_main_figures_tables.qmd/chapter1_supplement_figures_tables.qmd-> just the figures/tables.
In the main document, only Table 1 and Figure 1 need the raw Excel; if those
paths are unset, those two chunks skip and everything else still renders from
.rda.Notes for review
chapter1.qmdrenders to docx + pdf; bibliography isreferences.bib(39 refs)with the PLOS NTD CSL.
chapter1.qmdare produced by the packaged functions (not staticplaceholders), so they stay in sync with the standalone figure/table QMDs.