Skip to content

Sample size em edits#369

Open
d-morrison wants to merge 11 commits into
mainfrom
sample-size-em-edits
Open

Sample size em edits#369
d-morrison wants to merge 11 commits into
mainfrom
sample-size-em-edits

Conversation

@d-morrison

Copy link
Copy Markdown
Member

Thank you for taking the time to submit a pull request!

To maximize the chances of acceptance:

  • The title of your PR should briefly describe the change.
  • The body of your PR should contain Fixes #issue-number (if relevant).
  • Commit/merge messages to be included in NEWS.md should begin with -.
  • Code should follow the tidyverse style guide.
  • Documentation should use roxygen2, with Markdown syntax.
  • Contributions should include unit tests (using testthat).

For more information see Contributing.

@codecov

codecov Bot commented Feb 4, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

@d-morrison d-morrison marked this pull request as ready for review May 19, 2026 02:40
Copilot AI review requested due to automatic review settings May 19, 2026 02:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new data-raw/ script to run parallelized seroincidence simulations for exploring sample-size behavior, and updates package metadata to include additional optional dependencies used by supporting workflows/scripts.

Changes:

  • Add data-raw/sample-size.R to simulate cross-sectional serology data and summarize CI coverage across scenarios.
  • Add furrr and svglite to Suggests in DESCRIPTION (for optional parallel mapping and SVG-based visual regression testing support).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
DESCRIPTION Adds optional dependencies (furrr, svglite) needed by scripts/testing tools.
data-raw/sample-size.R New simulation + summarization script for evaluating sample size / CI coverage under different incidence rates.

Comment thread data-raw/sample-size.R Outdated
Comment on lines +55 to +64
est <- est.incidence(
pop_data = csdata,
curve_params = dmcmc,
noise_params = cond,
lambda_start = 0.1,
build_graph = FALSE,
verbose = FALSE,
print_graph = FALSE,
antigen_isos = antibodies
)
Comment thread data-raw/sample-size.R Outdated
## Load longitudinal parameters

test_sim <- "https://osf.io/download/rtw5k" %>%
load_curve_params() %>%
Comment thread data-raw/sample-size.R Outdated
Comment on lines +41 to +52
csdata <- sim.cs(
curve_params = dmcmc,
lambda = lambda,
n.smpl = nrep,
age.rng = lifespan,
antigen_isos = antibodies,
n.mc = 0,
renew.params = TRUE, # Use different parameters for each simulation
add.noise = TRUE,
noise_limits = dlims,
format = "long"
)
Comment thread data-raw/sample-size.R
Comment on lines +100 to +105
results_100 <- simulate_seroincidence(lambda = 0.2, nrep = 100, n_sim = 300)

results_100_0.1 <- simulate_seroincidence(lambda = 0.1, nrep = 100, n_sim = 300)

results_100_0.01 <- simulate_seroincidence(lambda = 0.1, nrep = 100, n_sim = 300)

Comment thread data-raw/sample-size.R Outdated

results_100_0.1 <- simulate_seroincidence(lambda = 0.1, nrep = 100, n_sim = 300)

results_100_0.01 <- simulate_seroincidence(lambda = 0.1, nrep = 100, n_sim = 300)
Comment thread data-raw/sample-size.R
Comment on lines +150 to +154
final_table_50 <- results_50 |> generate_final_table(sample_size = 50)
final_table_100 <- results_100 |> generate_final_table(sample_size = 100)
final_table_100_0.1 <- results_100_0.1 |> generate_final_table(sample_size = 100)
ft_100_0.01 <- results_100_0.01 |> generate_final_table()
lambda <- function(results_table)
github-actions Bot added a commit that referenced this pull request May 19, 2026
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 2m 38s —— View job


Code Review

  • Gather context and read changed files
  • Review DESCRIPTION changes
  • Review data-raw/sample-size.R
  • Post findings

Summary

This PR adds data-raw/sample-size.R — a simulation script for exploring CI coverage under different sample sizes and incidence rates — and adds furrr and svglite to Suggests. The DESCRIPTION changes are fine. The script has one critical runtime error, two lint failures that will block CI, and several deprecated API calls and reproducibility gaps.


Critical Bug — Will Error at Runtime

data-raw/sample-size.R, line 55–64est.incidence() is a pure ...-passthrough to est_seroincidence(). The argument curve_params = dmcmc has no matching parameter in est_seroincidence() (which uses sr_params, not curve_params); it will be absorbed by ... silently. But sr_params has no default value, so the call will immediately fail with argument "sr_params" is missing, with no default. The function will never run successfully as written.

Fix: replace the block with the current API:

est <- est_seroincidence(
  pop_data    = csdata,
  sr_params   = dmcmc,
  noise_params = cond,
  lambda_start = 0.1,
  build_graph  = FALSE,
  verbose      = FALSE,
  antigen_isos = antibodies
)

Fix this →


CI Lint Failures — Will Block Merge

Lines 102, 104, 152, 153 — Object names containing . violate the repo's object_name_linter (snake_case). The .lintr.R excludes pipe_consistency_linter and undesirable_function_linter for data-raw/, but object_name_linter is not excluded and applies here.

Line Current name Valid snake_case alternative
102 results_100_0.1 results_100_0_1
104 results_100_0.01 results_100_0_01
152 final_table_100_0.1 final_table_100_0_1
153 ft_100_0.01 ft_100_0_01

All downstream references to these names (lines 152, 153, 178–179) must be updated consistently.

Fix this →


Logic Bug — Mislabeled Result

Line 104results_100_0.01 is named to suggest λ=0.01, but simulate_seroincidence(lambda = 0.1, ...) passes λ=0.1 — the same as line 103. The coverage table it feeds into (ft_100_0.01) will therefore report results for the wrong scenario. Either change lambda = 0.1 to lambda = 0.01 or rename the object. (Copilot flagged this too.)


Reproducibility — Missing Seeds

Lines 102 and 104results_50 (line 97) and results_100 (line 100) each have a set.seed() call before them, which seeds furrr's parallel RNG via furrr_options(seed = TRUE). The two subsequent simulate_seroincidence() calls at lines 102 and 104 have no seed, so their output will differ across runs. Add set.seed() calls before each.


Deprecated API Calls — Will Emit Lifecycle Warnings

All three deprecated wrappers emit lifecycle::deprecate_soft() warnings and will eventually be removed. data-raw/ scripts are run by maintainers and should stay aligned with the current public API.

Line Deprecated call Current replacement
10 load_curve_params(url) load_sr_params(url)
41 sim.cs(curve_params=, n.smpl=, age.rng=, n.mc=, renew.params=, add.noise=, ...) sim_pop_data(curve_params=, n_samples=, age_range=, n_mcmc_samples=, renew_params=, add_noise=, ...)
55 est.incidence(pop_data=, curve_params=, noise_params=, ...) est_seroincidence(pop_data=, sr_params=, noise_params=, ...) (see Critical Bug above)

Fix this →


Minor Issues

  • Line 92: library(future) is called in the middle of the script, after the function definition. Move all library() calls to the top alongside the existing ones (lines 1–4). (undesirable_function_linter is excluded for data-raw/, so no lint failure, but it hurts readability.)

  • Line 120: 1:length(results_list) is fragile when results_list is empty (produces c(1, 0)). Prefer seq_along(results_list).

  • DESCRIPTION / future: The script calls future::plan() directly (line 93, 107) but future is not listed in Suggests. Since furrr depends on future, it will be available transitively, but explicit declaration makes the dependency intent clear.


DESCRIPTION Changes

furrr and svglite additions are correctly placed in Suggests (alphabetical order is preserved for furrr; svglite is placed between tidyverse and vdiffr which is fine). No concerns here.

github-actions Bot added a commit that referenced this pull request Jun 3, 2026
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude Code is working…

I'll analyze this and get back to you.

View job run

- Replace deprecated load_curve_params/sim.cs/est.incidence with
  load_sr_params/sim_pop_data/est_seroincidence (and sr_params arg)
- Rename objects with dots to snake_case (object_name_linter)
- Fix mislabeled lambda=0.01 scenario and add per-call seeds
- Use seq_along(); declare future in Suggests; move library(future) to top

Copy link
Copy Markdown
Member Author

Addressed the review findings in data-raw/sample-size.R:

  • Critical/deprecated API: replaced load_curve_params()load_sr_params(), sim.cs()sim_pop_data() (renamed n.smpl/age.rng/n.mc/renew.params/add.noisen_samples/age_range/n_mcmc_samples/renew_params/add_noise), and est.incidence(curve_params=)est_seroincidence(sr_params=) (fixes the missing-sr_params runtime error).
  • snake_case: renamed results_100_0.1/results_100_0.01/final_table_100_0.1/ft_100_0.01 to underscore forms and updated all references.
  • Logic fix: the λ=0.01 scenario now actually passes lambda = 0.01.
  • Reproducibility: added set.seed() before the two unseeded runs.
  • Minor: seq_along() instead of 1:length(), moved library(future) to the top, and declared future in Suggests.

This is an internal maintainer script in data-raw/ (not user-visible / not installed), so I applied the no changelog + no version increment labels for the version-check and changelog gates, and added gh to WORDLIST to fix spellcheck.

@claude review


Generated by Claude Code

@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 2m 16s —— View job


Code Review

  • Gather context and read changed files
  • Verify previous findings were addressed
  • Check for new issues
  • Post findings

Summary

All findings from the previous review have been addressed. The script is substantially improved. A few minor remaining items below — none are blocking for a data-raw/ maintainer script.


Previous Findings — All Resolved ✅

Finding Status
Critical: est.incidence(curve_params=) ✅ Fixed → est_seroincidence(sr_params=)
Lint: dots in object names ✅ Fixed → snake_case underscores
Logic: λ=0.01 scenario used λ=0.1 ✅ Fixed → lambda = 0.01 on line 105
Seeds missing before two simulation calls ✅ Fixed → set.seed() before all 4 calls
Deprecated load_curve_params() ✅ Fixed → load_sr_params()
Deprecated sim.cs() ✅ Fixed → sim_pop_data()
1:length() fragile ✅ Fixed → seq_along()
library(future) mid-script ✅ Fixed → moved to top
future not in Suggests ✅ Fixed

Remaining Minor Items

Line 77 — 1:n_sim still fragile

The previous fix changed 1:length(results_list)seq_along(), but 1:n_sim (same class of issue) remains:

furrr::future_map(.x = 1:n_sim, ...)

Prefer seq_len(n_sim) — if n_sim = 0 (e.g. testing), 1:0 produces c(1, 0) and runs two iterations.

Fix this →


Lines 179–180 — Coverage not printed

coverage_count_50 and coverage_count_100 are captured and printed (lines 175–177), but the λ=0.1 and λ=0.01 results are silently dropped:

final_table_100_0_1 |> compute_coverage()   # result discarded
ft_100_0_01 |> compute_coverage()           # result discarded

When run non-interactively (e.g. Rscript data-raw/sample-size.R) these results are lost. For consistency, either assign or pipe through print().


Lines 182–183 — paste() on a data frame

compute_coverage() returns a data frame, so:

print(paste("Number of rows where CI covers true lambda:", coverage_count_50))

will produce something like "Number of rows where CI covers true lambda: c(no_error_count = …)" rather than a useful message. Consider cat() on individual columns or just print(coverage_count_50) directly.


DESCRIPTION / WORDLIST

furrr, future, and svglite correctly placed in Suggests (alphabetical order maintained). gh in WORDLIST is fine. No concerns.


@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

👀 Picked up by workflow run #26910326827. R setup runs first; Claude itself responds after that.

github-actions Bot added a commit that referenced this pull request Jun 3, 2026
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

📖 https://UCD-SERG.github.io/serocalculator/preview/pr369/
Preview documentation for this PR (at commit 4f21266)

Copy link
Copy Markdown
Member Author

@claude review


Generated by Claude Code

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

👀 Picked up by workflow run #26911728637. R setup runs first; Claude itself responds after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants