Skip to content

Add km_by_hand_table() and na_by_hand_table() functions#899

Open
d-morrison wants to merge 8 commits into
mainfrom
claude/admiring-dijkstra-ithzii
Open

Add km_by_hand_table() and na_by_hand_table() functions#899
d-morrison wants to merge 8 commits into
mainfrom
claude/admiring-dijkstra-ithzii

Conversation

@d-morrison

Copy link
Copy Markdown
Owner

Ports the Kaplan-Meier and Nelson-Aalen "by-hand" table builders from ucdavis/epi204 PR #272 into rme as exported package functions.

Previously rme had no equivalent: it computed KM by hand inline in _subfiles/intro-to-survival-analysis/_sec-surv-KM-calculations.qmd (a dplyr worksheet printed with pander), and had no by-hand Nelson-Aalen table at all.

Changes

  • R/km_by_hand_table.Rkm_by_hand_table(data): step-by-step KM table, each event row shown as S_prev × factor = S_new.
  • R/na_by_hand_table.Rna_by_hand_table(data): Nelson-Aalen table showing H_prev + d/n = H_new and survival exp(-H) (uses the \expf macro already defined in rme).
  • Both take a data frame with time and death columns and return a knitr_kable of LaTeX math strings (escape = FALSE).
  • NAMESPACE + man/*.Rd: export and document both functions.
  • DESCRIPTION: move dplyr from Suggests to Imports and add utils (both used by the new functions).

Verification

  • lintr::lint() on both new files — no lints.
  • spelling::spell_check_package() — no spelling errors.
  • Ran both functions on a small survival example; output renders correct fractional KM/NA tables.

No .qmd/subfiles were edited, so no chapter render was required. The functions are not yet wired into any chapter — happy to add usage to the survival subfiles in a follow-up if desired.

https://claude.ai/code/session_011MedxXwDfmV4GnQVDDgzGv


Generated by Claude Code

Port the Kaplan-Meier and Nelson-Aalen by-hand table builders from
ucdavis/epi204 (PR #272) into rme as exported package functions.

- km_by_hand_table(): step-by-step KM table (S_prev x factor = S_new)
- na_by_hand_table(): Nelson-Aalen table (H_prev + d/n = H_new, exp(-H))

Both return knitr_kable objects of LaTeX math strings. Move dplyr into
Imports and add utils (both used by the new functions).

https://claude.ai/code/session_011MedxXwDfmV4GnQVDDgzGv
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 3m 9s —— View job


Code Review

  • Gather context
  • Read changed files
  • Review R/km_by_hand_table.R
  • Review R/na_by_hand_table.R
  • Review DESCRIPTION, NAMESPACE, man/ pages
  • Post findings

The overall structure is clean and the fraction-arithmetic approach is clever. A few correctness issues and one portability concern are worth fixing before this is wired into a chapter.


Bug — Non-integer time values silently truncated

R/km_by_hand_table.R line 50 / R/na_by_hand_table.R line 50

Both output tibbles initialise the Time-point column as integer(nrow(df)):

`Time-point` = integer(nrow(df)),

Then on lines 125/97 the actual (numeric) df$time[i] is stored there. When time contains non-integer values (e.g. 1.5, 2.7) the assignment silently truncates them — 1.5 becomes 1. The @param doc says time is numeric, so this input is valid.

Fix: change the initialisation to numeric(nrow(df)) in both files. Fix this →


Bug — format() produces LaTeX-breaking scientific notation

R/km_by_hand_table.R lines 102, 113 / R/na_by_hand_table.R line 94

All three sites use format(x) to produce the decimal part of a math string, e.g.:

sprintf("$%s \\times %s = \\frac{%d}{%d} = %s$",
        prev_surv_str, frac_str, surv_n, surv_d, format(surv_n / surv_d))

format() respects getOption("digits") and falls back to scientific notation for small values — format(1e-6)"1e-06". That string is not valid in a LaTeX math environment (the - is a minus but e is text, not \times 10^{-6}), producing broken output for any dataset where survival drops near zero.

Fix: use formatC(x, digits = 4, format = "f") or sprintf("%.4f", x) at all three sites. Fix this →


Bug — Integer overflow in fraction tracking for larger datasets

R/km_by_hand_table.R lines 45–46, 73–74 / R/na_by_hand_table.R lines 45–46, 70–71

The running numerators and denominators are initialised as R integers (1L, 0L) and multiplied as integers. R's 32-bit integers overflow to NA (with a warning) when the product exceeds ~2.1 billion. GCD reduction keeps them small for the tiny teaching datasets this is designed for, but silently wrong results are possible for datasets with dozens of distinct event times at coprime risk-set sizes.

Fix: initialise as doubles (1, 0) instead of integers (1L, 0L) in both files. Fix this →


Portability concern — \expf is a book-only macro

R/na_by_hand_table.R line 92

surv_cell <- sprintf("$\\expf{-%s} \\approx %s$", cum_haz_str, format(surv_val))

\expf is a custom macro defined in latex-macros/macros.qmd. It is not a standard LaTeX command. Any use of na_by_hand_table() outside the book's render pipeline — a vignette, a pkgdown reference page, a standalone knitr document — will produce \expf{...} as literal text (undefined control sequence in pdflatex, or raw text in HTML). Consider documenting this constraint explicitly in @details, or substituting \exp\left\{...\right\} which is standard LaTeX. Fix this →


Code duplication — gcd_fn defined twice

R/km_by_hand_table.R lines 22–29 / R/na_by_hand_table.R lines 22–29

The GCD helper is identical in both files. As new by-hand table functions are added, this will proliferate further. Consider extracting it as an unexported package-level function (e.g. .gcd <- function(a, b) { ... } in R/utils-internal.R). Fix this →


Documentation — missing @examples (R CMD check NOTE)

R/km_by_hand_table.R line 11 / R/na_by_hand_table.R line 11

Exported functions without an \examples{} block generate a NOTE in R CMD check. A minimal \dontrun{} example with a three-row data frame would satisfy the check and demonstrate usage:

#' @examples
#' \dontrun{
#'   d <- data.frame(time = c(1, 2, 3, 3, 5), death = c(1, 0, 1, 1, 0))
#'   km_by_hand_table(d)
#' }

Minor — Risk set column also initialised as integer

Related to the Time-point issue: the Risk set column (integer(nrow(df)) in both files) is populated from ni <- as.integer(df$n_risk[i]), so the coercion is explicit and safe. No change needed, just noting it for completeness.

@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 2m 44s —— View job


Code Review

  • Read changed files
  • Review R/km_by_hand_table.R
  • Review R/na_by_hand_table.R
  • Review DESCRIPTION, NAMESPACE, man/ pages
  • Post findings

The structure is clean and the fraction-tracking approach is pedagogically sound. The previous automated review (06:35 UTC) already identified the main issues; I've independently verified all of them and found two additional items.


Bug — Time-point column silently truncates non-integer times

R/km_by_hand_table.R line 50 / R/na_by_hand_table.R line 50

`Time-point` = integer(nrow(df)),

The column is initialised as integer, but @param documents time as numeric. When line 125 (km) / line 97 (na) assigns df$time[i] — which can be non-integer — into an integer column, R silently truncates (e.g. 1.5 → 1).

Fix: change to numeric(nrow(df)) in both files.


Bug — format() produces LaTeX-invalid scientific notation

R/km_by_hand_table.R lines 102, 113 / R/na_by_hand_table.R line 94

format(c_n / c_d)        # line 102 km
format(surv_n / surv_d)  # line 113 km
format(surv_val)         # line 94 na

format() respects getOption("digits") and falls back to scientific notation for values near zero — e.g. format(1e-6) yields "1e-06". Inside a $...$ LaTeX block that string is not valid math (the e is text, not \times 10^{}), so any dataset where survival drops near zero produces a broken table cell.

Fix: replace with sprintf("%.4f", x) at all three sites.


Bug — integer overflow in cumulative fraction tracking

R/km_by_hand_table.R lines 45–46 / R/na_by_hand_table.R lines 45–46

# km
surv_n <- 1L ; surv_d <- 1L
# na
cum_haz_n <- 0L ; cum_haz_d <- 1L

The running numerators and denominators are R 32-bit integers. GCD reduction keeps them small for tiny teaching examples, but on a dataset with many event times at coprime risk-set sizes the product can exceed ~2.1 billion and overflow to NA (with a warning), silently producing wrong output.

Fix: initialise as doubles (1, 1 / 0, 1) in both files.


Portability concern — \expf is a book-only macro

R/na_by_hand_table.R line 92

surv_cell <- sprintf("$\\expf{-%s} \\approx %s$", cum_haz_str, format(surv_val))

\expf is defined in latex-macros/macros.qmd as \operatorname{exp}\cb{#1}. It is not a standard LaTeX command. Any use of na_by_hand_table() outside the book's render pipeline — a vignette, a pkgdown reference page, a standalone document — will produce an undefined control sequence error (PDF) or literal text (HTML).

Fix: either substitute standard LaTeX \\exp\\left\\{-%s\\right\\}, or add a @details note warning that this function requires the book's custom macro environment.


Code duplication — gcd_fn defined identically in both files

R/km_by_hand_table.R lines 22–29 / R/na_by_hand_table.R lines 22–29

The helper is byte-for-byte identical. As further by-hand table functions are added this will keep proliferating. Extract to an unexported package-level function (e.g. .gcd in R/utils-internal.R).


Documentation — missing @examples triggers R CMD check NOTE

R/km_by_hand_table.R line 11 / R/na_by_hand_table.R line 11

Both exported functions lack an \examples{} block, which generates a NOTE in R CMD check. A minimal \dontrun{} block suffices:

#' @examples
#' \dontrun{
#'   d <- data.frame(time = c(1, 2, 3, 3, 5), death = c(1, 0, 1, 1, 0))
#'   km_by_hand_table(d)
#' }

Additional — death column values are not validated

R/km_by_hand_table.R line 35 / R/na_by_hand_table.R line 35

deaths = sum(status),

The docs promise death is 0/1 integer or logical, but this is not enforced. A column with values 0/2 (e.g. from a merge accident) will silently double-count deaths and produce a table that looks plausible but is numerically wrong. A stopifnot(all(data$death %in% c(0L, 1L, TRUE, FALSE))) guard would surface this early.


DESCRIPTION / NAMESPACE — looks correct

Moving dplyr from Suggests to Imports and adding utils is the right call given both are now used unconditionally via ::. NAMESPACE exports are correct; :: calls don't need importFrom entries.


@d-morrison d-morrison marked this pull request as ready for review June 9, 2026 07:47
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 3m 49s —— View job


Code Review

  • Read changed files
  • Review R/km_by_hand_table.R
  • Review R/na_by_hand_table.R
  • Review DESCRIPTION, NAMESPACE, man/ pages
  • Post findings

The structure is sound and the fraction-arithmetic approach works well for pedagogy. Two earlier automated reviews (06:35 and 07:06 UTC) found the major issues; I've independently verified all of them and found one additional item. Findings are grouped below; the five bugs are the blockers.


Bugs

Time-point column silently truncates non-integer times

R/km_by_hand_table.R:50 / R/na_by_hand_table.R:50

`Time-point` = integer(nrow(df)),

@param documents time as numeric. When line 125 (km) / line 97 (na) stores df$time[i] — which can be non-integer — into an integer column, R silently truncates: 1.5 → 1.

Fix: numeric(nrow(df)) in both files. Fix this →


format() produces LaTeX-invalid scientific notation

R/km_by_hand_table.R:102,113 / R/na_by_hand_table.R:94

format(c_n / c_d)       # km line 102
format(surv_n / surv_d) # km line 113
format(surv_val)        # na line 94

format() respects getOption("digits") and falls back to scientific notation for small values — format(1e-6)"1e-06". Inside $...$ math that string is invalid LaTeX (the e is text, not \times 10^{}), producing broken cells for any dataset where survival drops near zero.

Fix: sprintf("%.4f", x) at all three sites. Fix this →


Integer overflow in cumulative fraction tracking

R/km_by_hand_table.R:45–46 / R/na_by_hand_table.R:45–46

# km
surv_n <- 1L ; surv_d <- 1L
# na
cum_haz_n <- 0L ; cum_haz_d <- 1L

The running numerators/denominators are R 32-bit integers. GCD reduction keeps them small for tiny examples, but coprime risk-set sizes at many event times can push the product past ~2.1 billion → silent NA_integer_ overflow. The cascade is especially bad in the KM loop: once surv_n becomes NA, if (surv_n > 0L) evaluates to NA (skipping GCD reduction on line 75), so every subsequent row also overflows.

Fix: initialise as doubles (1, 1 / 0, 1) in both files. Fix this →


No guard against NA in the time column

R/km_by_hand_table.R:18 / R/na_by_hand_table.R:18

time <- data$time

NA values in time are not checked. dplyr::arrange(time) silently places NA rows at the bottom, and the risk-set computation on line 43 (n - left_before) becomes wrong for all subsequent rows, producing a table that looks plausible but is numerically incorrect. (The death validation suggested in the earlier review would catch NAs there, but time is a separate column.)

Fix: add stopifnot(!anyNA(data$time)) alongside the existing column-presence checks. Fix this →


\expf is a book-only macro

R/na_by_hand_table.R:92

surv_cell <- sprintf("$\\expf{-%s} \\approx %s$", cum_haz_str, format(surv_val))

\expf is defined in latex-macros/macros.qmd and is not a standard LaTeX command. Any use of na_by_hand_table() outside the book's render pipeline — a vignette, a pkgdown reference page, a standalone knitr document — will produce an undefined control sequence error (PDF) or literal text (HTML).

Fix: either use standard LaTeX \\exp\\left\\{-%s\\right\\}, or add a @details note warning that this function requires the book's custom macro environment. Fix this →


Code quality

gcd_fn defined identically in both files

R/km_by_hand_table.R:22–29 / R/na_by_hand_table.R:22–29

The helper is byte-for-byte the same in both files. Extract to an unexported package-level function (e.g. .gcd in R/utils-internal.R) to avoid drift as further by-hand table functions are added. Fix this →


Missing @examples triggers R CMD check NOTE

R/km_by_hand_table.R:11 / R/na_by_hand_table.R:11

Both exported functions lack an \examples{} block, which generates a NOTE in R CMD check. A minimal \dontrun{} block suffices:

#' @examples
#' \dontrun{
#'   d <- data.frame(time = c(1, 2, 3, 3, 5), death = c(1, 0, 1, 1, 0))
#'   km_by_hand_table(d)
#' }

death column values not validated

R/km_by_hand_table.R:35 / R/na_by_hand_table.R:35

deaths = sum(status),

Values outside {0, 1} (e.g. from a merge accident) silently produce a table that looks plausible but is numerically wrong. A stopifnot(all(data$death %in% c(0L, 1L, TRUE, FALSE))) guard would surface this early.


DESCRIPTION / NAMESPACE — correct

Moving dplyr from Suggests to Imports and adding utils is the right call given both are used unconditionally via ::. NAMESPACE exports are correct; :: calls don't require importFrom entries.


…h; add examples

- Time-point columns initialised as numeric() not integer() (preserve non-integer times)
- Replace format() with fixed-point formatC() in math strings (avoid LaTeX-breaking scientific notation near zero)
- Track running fraction numerators/denominators as doubles (avoid 32-bit integer overflow)
- Validate death is 0/1 or logical
- Add runnable @examples to both functions
- Document na_by_hand_table()'s \expf macro dependency in @details

https://claude.ai/code/session_011MedxXwDfmV4GnQVDDgzGv
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 3m 36s —— View job


Code Review

  • Gather context
  • Read changed files (post-fix state at 1259e7a)
  • Review R/km_by_hand_table.R
  • Review R/na_by_hand_table.R
  • Review DESCRIPTION, NAMESPACE, man/ pages
  • Post findings

Good follow-through on the prior round of reviews. All five bugs called out in the 07:47 run have been correctly fixed; my independent verification of each is below, followed by one item that was raised but not addressed.


Fixes verified as correct

Time-point integer truncationnumeric(nrow(df)) in both files (km:63, na:71). ✓

format() scientific notation — replaced with fmt_dec() = formatC(x, format = "f", digits = 4, drop0trailing = TRUE) (km:38–40, na:45–47). Fixed at all call sites. ✓ The drop0trailing = TRUE flag is a nice touch: 0.5 renders as "0.5" rather than "0.5000", which is cleaner in the table cells.

Integer overflowsurv_n <- 1 ; surv_d <- 1 and cum_haz_n <- 0 ; cum_haz_d <- 1 are now doubles (km:58–59, na:66–67). Whole-number doubles stay exact up to 2^53, and %%/%/% work correctly on them. ✓

death validationall(data$death %in% c(0, 1)) in both files (km:19, na:26). This also correctly catches NA in death because NA %in% c(0, 1) returns FALSE in R (unlike is.element with NA, %in% never propagates NA). Logical TRUE/FALSE passes via numeric coercion. ✓

@examples — runnable (no \dontrun{}) in both .Rd files. knitr is in Imports so it's always available; the \expf{...} strings in the example output are just characters and don't cause errors during R CMD check EXAMPLES. ✓

\expf portability — documented in @details of na_by_hand_table.R (na:10–15) and propagated to the .Rd file. Acceptable given the function is intentionally book-scoped. ✓


One unaddressed item

Missing NA check for the time column

R/km_by_hand_table.R:16–20 / R/na_by_hand_table.R:23–27

The 07:47 review raised this; it's not in the author's fix list. The current stopifnot block validates death for NA (indirectly, via %in%) but not time:

stopifnot(
  "data must have a 'time' column"  = "time"  %in% names(data),
  "data must have a 'death' column" = "death" %in% names(data),
  "'death' must be 0/1 or logical"  = all(data$death %in% c(0, 1))
  # ← no guard for NA in time
)

dplyr::arrange(time) silently places NA rows at the end. Those rows inflate the denominator n but never match a valid time group, so every subsequent risk-set value is wrong — and the resulting table looks numerically plausible.

Fix: add "'time' must not contain NA" = !anyNA(data$time) alongside the existing checks. Fix this →


Minor observation

as.character(cum_haz_n) in R/na_by_hand_table.R:97

cum_haz_str <- if (cum_haz_d == 1) {
  as.character(cum_haz_n)
} else { ... }

as.character(1e15) produces "1e+15", which would be invalid LaTeX. For any realistic teaching dataset the cumulative hazard won't reach that magnitude, so this is not a bug in practice — just worth knowing if the function is ever exercised on larger data.


DESCRIPTION / NAMESPACE

dplyr in Imports, utils added — correct. NAMESPACE exports match. :: calls don't need importFrom entries. All good.


Overall the code is in solid shape post-fixes. The only actionable item is the time NA guard.

Copy link
Copy Markdown
Owner Author

Thanks for the reviews. Addressed in 1259e7a:

  • Non-integer time truncation — fixed. Time-point columns now initialised as numeric(nrow(df)) in both files.
  • format() scientific notation — fixed. Replaced with a local fmt_dec() = formatC(x, format = "f", digits = 4, drop0trailing = TRUE) at all three sites, so values near zero render as fixed-point (e.g. 0.0449) instead of 4.49e-02. Verified on an all-events dataset.
  • Integer overflow in fraction tracking — fixed. The running numerators/denominators are now doubles (whole-number values stay exact up to 2^53); %/%/%% work unchanged.
  • death not validated — fixed. Added stopifnot(all(data$death %in% c(0, 1))) (also accepts logical via coercion).
  • Missing @examples — added a small runnable example to each (no \dontrun{}, so it's actually exercised by R CMD check).
  • \expf portability — kept the \expf macro (per the book's convention of using the custom macros from latex-macros/macros.qmd) and documented the dependency in @details instead of switching to raw LaTeX.

One suggestion not taken — extracting gcd_fn to a shared .gcd: I tried this first, but a package-private helper in a separate file makes object_usage_linter emit "no visible global function definition" for every call site. The lint-changed-files.yaml job restores renv deps but never installs rme itself, so lintr can't load the package namespace to resolve the helper — and it sets LINTR_ERROR_ON_LINT: true, so it would fail CI (reproduced locally via lint_dir). I reverted to a local gcd_fn in each function (matching the upstream epi204 pattern) to keep the lint job green. Happy to extract it in a separate infra PR that also teaches the lint job to load the package namespace.

These functions were ported from ucdavis/epi204, where the same truncation / format() / overflow issues exist — I can open a corresponding fix PR there if useful.


Generated by Claude Code

@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 3m 16s —— View job


Code Review

  • Gather context
  • Read changed files
  • Review R/km_by_hand_table.R
  • Review R/na_by_hand_table.R
  • Review DESCRIPTION, NAMESPACE, man/ pages
  • Post findings

All five bugs called out in the prior review cycle (Time-point truncation, format() sci notation, integer overflow, missing @examples, \expf portability) are correctly fixed. Three findings below — one medium, two low.


Missing NA guard for time — medium (unaddressed carry-over)

R/km_by_hand_table.R:16–20 / R/na_by_hand_table.R:23–27

The stopifnot block validates death values but not time:

stopifnot(
  "data must have a 'time' column"  = "time"  %in% names(data),
  "data must have a 'death' column" = "death" %in% names(data),
  "'death' must be 0/1 or logical"  = all(data$death %in% c(0, 1))
  # ← no NA guard for time
)

dplyr::arrange(time) silently places NA rows at the end. Those rows inflate n but never match a valid time group in the summarise step, so every n_risk value is wrong — producing a table that looks plausible but is numerically incorrect. The 07:53 review raised this; it was not included in the fix commit.

Fix: add "'time' must not contain NA" = !anyNA(data$time) to the stopifnot block in both files. Fix this →


%d format applied to double variables — low

R/km_by_hand_table.R:125 / R/na_by_hand_table.R:100

surv_n/surv_d (km) and cum_haz_n/cum_haz_d (na) are initialised as doubles and accumulate through double arithmetic; they are formatted with %d:

# km line 125
sprintf("$%s \\times %s = \\frac{%d}{%d} = %s$",
        prev_surv_str, frac_str, surv_n, surv_d, ...)

# na line 100
sprintf("\\frac{%d}{%d}", cum_haz_n, cum_haz_d)

In practice this works — GCD operations keep the values as whole-number doubles, and R's sprintf coerces them via as.integer() — but it is semantically wrong and would produce either a silent truncation or an error (invalid format '%d'; use format %f, %e, %g or %G for numeric objects) if floating-point rounding ever produced a value like 97.000000001 rather than exactly 97.0. Using "%.0f" instead of "%d" for these positions would be both correct and defensive: sprintf("...\\frac{%.0f}{%.0f}...", surv_n, surv_d, ...). Fix this →


as.character() on cumulative hazard numerator — low (minor observation repeated)

R/na_by_hand_table.R:98

cum_haz_str <- if (cum_haz_d == 1) {
  as.character(cum_haz_n)   # "1e+15" for large values
} else { ... }

The prior review noted this: as.character(1e15)"1e+15", which is invalid inside a LaTeX $...$ block. For any realistic teaching dataset the cumulative hazard won't approach that magnitude, so this is not a practical bug. The %.0f fix above (point 2) applies here too — replacing as.character(cum_haz_n) with sprintf("%.0f", cum_haz_n) is consistent and removes the edge case. Fix this →


DESCRIPTION / NAMESPACE — correct

dplyr in Imports, utils added, knitr already present. NAMESPACE exports match. :: calls don't require importFrom entries. Runnable @examples in both .Rd files. All good.


@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1
🚀 Preview available at: https://d-morrison.github.io/rme/pr-preview/pr-899/

Built to branch gh-pages at 2026-06-09 08:18 UTC.

Copy link
Copy Markdown
Owner Author

@claude review

(Verifying the dispatch-permission fix merged in #911 — this should now fire a dispatched claude-code-review.yml run instead of a silent 403.)


Generated by Claude Code

@github-actions

Copy link
Copy Markdown
Contributor

👀 Picked up by workflow run #27715975209. R/Quarto/renv setup runs first (~3-5 min); Claude itself responds after that.

@claude

claude Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 2m 47s —— View job


Code Review

  • Gather context
  • Read changed files
  • Review R/km_by_hand_table.R
  • Review R/na_by_hand_table.R
  • Review DESCRIPTION, NAMESPACE, man/ pages
  • Post findings

Good state post-fixes. All five bugs from the earlier review cycle are correctly resolved. Three items remain — one medium carry-over that has now appeared in three consecutive automated reviews without being addressed, and two low-priority formatting consistency issues.


Carry-over — Missing NA guard for time

R/km_by_hand_table.R:16–20 / R/na_by_hand_table.R:23–27

The stopifnot block validates column presence and death values but not NA in time:

stopifnot(
  "data must have a 'time' column"  = "time"  %in% names(data),
  "data must have a 'death' column" = "death" %in% names(data),
  "'death' must be 0/1 or logical"  = all(data$death %in% c(0, 1))
  # ← no NA guard for time
)

dplyr::arrange(time) silently places NA rows at the end of the sorted frame. Those rows still count toward n, so every df$n_risk value from that point onward is inflated by the number of NA observations — producing a table that looks plausible but is numerically wrong.

This was raised at 07:47, 07:53, and 08:07 UTC on 2026-06-09 and was not included in commit 1259e7a.

Fix: add "'time' must not contain NA" = !anyNA(data$time) to both stopifnot blocks. Fix this →


Low — %d format used for double variables

R/km_by_hand_table.R:125 / R/na_by_hand_table.R:100

surv_n and surv_d (km) and cum_haz_n and cum_haz_d (na) are initialised as doubles (1, 1 / 0, 1) and remain doubles through all arithmetic. They are formatted with %d:

# km line 125
sprintf("$%s \\times %s = \\frac{%d}{%d} = %s$",
        prev_surv_str, frac_str, surv_n, surv_d, ...)

# na line 100
sprintf("\\frac{%d}{%d}", cum_haz_n, cum_haz_d)

R's sprintf coerces doubles for %d by truncating toward zero. For whole-number doubles — which these always are after GCD reduction — the output is correct. However the coercion generates a warning in environments where options(warn = 1) is set, and would silently truncate if floating-point rounding ever produced a value like 97.000000001. Using "%.0f" instead of "%d" is both correct and defensive.

Note: di and ni (used in the same sprintf calls) are produced by as.integer() and are genuine R integers — %d is correct for them.

Fix: replace %d with %.0f at km:99 (frac_str for c_n/c_d), km:125 (surv_n/surv_d), na:100 (cum_haz_n/cum_haz_d). Positions for di and ni are fine as-is. Fix this →


Low — as.character() on double hazard numerator

R/na_by_hand_table.R:98

cum_haz_str <- if (cum_haz_d == 1) {
  as.character(cum_haz_n)   # "1e+15" for large values
} else {
  sprintf("\\frac{%d}{%d}", cum_haz_n, cum_haz_d)
}

as.character(1e15)"1e+15", which is invalid inside a LaTeX $...$ block. For any realistic teaching dataset the cumulative hazard won't approach that magnitude, so this is not a practical bug. The %.0f fix above applies here too and eliminates the edge case.

Fix: replace as.character(cum_haz_n) with sprintf("%.0f", cum_haz_n) for consistency with the else branch. Fix this →


DESCRIPTION / NAMESPACE — correct

dplyr in Imports, utils added, knitr already present. NAMESPACE exports are correct; :: calls don't require importFrom entries. Runnable @examples in both .Rd files. @details portability note for \expf is in place. All good.

claude added 3 commits June 18, 2026 23:39
…nators

- Add stopifnot guard rejecting NA in 'time' (NA rows sort last but still
  count toward n, inflating the risk set)
- Format double-valued running numerators/denominators with %.0f instead of
  %d (avoids coercion warnings under options(warn=1) and float-rounding
  truncation); keep %d for the genuine-integer di/ni
- Replace as.character(cum_haz_n) with sprintf("%.0f", ...) to avoid
  scientific notation for large integer hazard numerators

https://claude.ai/code/session_011MedxXwDfmV4GnQVDDgzGv

Copy link
Copy Markdown
Owner Author

All three addressed in the latest push (45f304a):

  • NA guard for time — added "'time' must not contain NA" = !anyNA(data$time) to both stopifnot blocks. Verified it errors on NA times instead of silently inflating the risk set.
  • %d on doubles — switched the double-valued running numerators/denominators to %.0f: c_n/c_d and surv_n/surv_d in km_by_hand_table(), cum_haz_n/cum_haz_d in na_by_hand_table(). Left %d for di/ni, which are genuine integers from as.integer(). Re-ran both functions under options(warn = 2) with no coercion warnings.
  • as.character() on double — replaced with sprintf("%.0f", cum_haz_n), consistent with the fraction branch.

lint_dir and spell_check_package() both clean; output for normal datasets is unchanged.


Generated by Claude Code

@claude

claude Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 3m 21s —— View job


Code Review

  • Gather context
  • Read changed files (current HEAD after 45f304a)
  • Review R/km_by_hand_table.R
  • Review R/na_by_hand_table.R
  • Review DESCRIPTION, NAMESPACE, man/ pages
  • Post findings

All nine issues flagged across the prior review cycle are correctly addressed. Independent verification below.


All prior findings resolved ✓

Finding Fix Verified
Time-point truncates non-integer times numeric(nrow(df)) (km:64, na:72)
format() produces invalid LaTeX sci notation fmt_dec() = formatC(..., format="f", drop0trailing=TRUE) (km:39-41, na:46-48)
Integer overflow in fraction accumulators surv_n/surv_d and cum_haz_n/cum_haz_d now doubles (km:59-60, na:67-68)
death not validated all(data$death %in% c(0, 1)) in both stopifnot blocks (km:19, na:26)
Missing @examples (R CMD check NOTE) Runnable examples added to both functions
\expf portability undocumented @details note in na_by_hand_table.Rd
No NA guard for time (persistent carry-over) !anyNA(data$time) in both stopifnot blocks (km:20, na:27)
%d format used for double variables %.0f at all double-valued fraction positions (km:100,115,126; na:99,101)
as.character() on double hazard numerator sprintf("%.0f", cum_haz_n) (na:99)

The drop0trailing = TRUE flag in fmt_dec is a nice touch — "0.5" rather than "0.5000", which reads better in the table cells.

The %d remaining at km:105, km:110, na:84, na:86, na:105 is correct — those positions hold di and ni which are produced by explicit as.integer() calls and are genuine R integers.


No new findings

The logic is sound end-to-end:

  • Fraction arithmetic (both KM and NA) uses the Euclidean GCD to keep numerators/denominators in lowest terms at each step.
  • The dplyr::group_by(time) |> summarise(...) correctly aggregates tied event times before the loop.
  • The KM prev_surv_str is updated only on event rows (di > 0), so censoring-only rows display the unchanged estimate. Consistent.
  • The NA cumulative hazard string and survival are computed from the already-updated cum_haz_n/cum_haz_d on all rows. Consistent.
  • %in% c(0, 1) correctly rejects NA in death (because NA %in% c(0, 1) returns FALSE in R, not NA). Logical TRUE/FALSE passes via numeric coercion.

DESCRIPTION / NAMESPACE — correct

dplyr in Imports, utils added. knitr already present (needed for the runnable @examples). NAMESPACE exports match. :: calls don't require importFrom entries. All good.


Noted design decision (not a finding)

gcd_fn is still local to each function rather than extracted as a package-level helper. The author's explanation (lintr object_usage_linter emits false positives for unexported helpers when the lint CI job doesn't install the package) is a legitimate constraint. Local helpers are the right call here given that constraint.


This PR is ready from a code-correctness standpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants