Skip to content

Potential Selection Bias for ALT/AST Testing #77

@davebridges

Description

@davebridges

Gap 5: Selection/surveillance bias in ALT/AST subset (n=84) and HbA1c subset (n=17) — single preoperative values may be non-randomly sampled by obesity status

Labels: reviewer-response analysis data-extraction priority-medium

Reviewer summary

Only 84 of 365 Cushing's patients had pre-operative ALT/AST values within the one-year window. Reviewer's concern: if liver tests were selectively ordered when clinically suspected, the 84 patients with values are enriched for abnormal liver function, and if that enrichment differs by obesity stratum, the observed synergy could be partly artifactual. Reviewer proposes either (Option 1) IPW for testing probability, or (Option 2) a repeated-measures model using all pre-op values within 365 days.

My critique of the critique

Legitimate:

  • Surveillance bias in EHR lab availability is real. 23% coverage of ALT/AST in the Cushing's cohort is a flag.
  • Option 2's repeated-measures framing is the strongest response — it uses more of the available data and handles within-person variability properly.
  • The same concern applies even more acutely to the HbA1c subset (n=17), which the reviewer didn't flag.

Overstated or misdirected:

  • Reviewer's refs 1, 2 ("obese patients have different baseline liver enzyme profiles") don't support the argument as made. That baseline difference is already absorbed by the obesity main effect in the matched control comparison — it's the obesity-by-Cushing's interaction that would have to differ, which requires the sampling bias to itself differ by stratum. That's an empirical question answered by looking at testing prevalence, not cited literature.
  • "ALT/AST temporal variability" (ref 3) is true but is actually an argument for the repeated-measures approach, not a criticism of the current analysis.

Direction of bias is not obvious and worth saying explicitly:

  • Scenario A (reviewer's implied concern): obese patients get more labs ordered because clinicians anticipate NAFLD, so the obese-Cushing's cell is enriched for patients with abnormal values → interaction inflated.
  • Scenario B (equally plausible): lean Cushing's patients trigger more extensive workup because Cushing's in a lean person is unusual and prompts a broader differential, so the lean-Cushing's cell is enriched → interaction dampened and true synergy is even larger than reported.
  • Scenario C (routine pre-op labs): most pre-surgical ALT/AST values are part of standard metabolic panels ordered for anesthesia clearance, not driven by suspected liver pathology. In this case, sampling is closer to random and bias is minimal.

The prevalence-of-testing analysis below tells you which scenario applies.

What needs to change

A. Prevalence of testing by stratum (cheap, do first)

For each outcome, report the fraction of the matched cohort with a usable pre-op value, by Cushings × obesity stratum:

library(dplyr)

testing_prevalence <- analysis_df |>
  mutate(
    stratum = paste(if_else(Cushings == 1, "Cushing's", "Control"),
                    if_else(BMI >= 30, "obese", "lean"))
  ) |>
  group_by(stratum) |>
  summarise(
    n_total        = n(),
    n_with_alt     = sum(!is.na(alt)),
    n_with_ast     = sum(!is.na(ast)),
    n_with_hba1c   = sum(!is.na(hba1c)),
    pct_alt        = mean(!is.na(alt))   * 100,
    pct_ast        = mean(!is.na(ast))   * 100,
    pct_hba1c      = mean(!is.na(hba1c)) * 100
  )

This goes in the supplement as a new table. If pct_alt is similar between obese and lean Cushing's cases (say within 10 percentage points), differential surveillance bias for the interaction term is limited and the reviewer's concern is largely defused without further modeling. If it differs substantially, the IPW analysis below does real work.

B. IPW for testing probability (Reviewer's Option 1)

Model the probability of having a pre-op ALT/AST value, derive stabilized weights, refit the outcome model weighted.

library(WeightIt)
library(broom)

# ALT/AST testing probability model
alt_testing_df <- analysis_df |>
  mutate(
    has_alt     = !is.na(alt),
    admit_year  = year(DeID_AdmitDate),
    days_to_sx  = as.numeric(cushings_procedure - DeID_AdmitDate)
  )

W <- weightit(
  has_alt ~ Cushings * Obesity + AgeInYears + GenderCode +
            RaceEthnicity + admit_year,
  data      = alt_testing_df,
  method    = "glm",
  estimand  = "ATE",
  stabilize = TRUE
)

alt_testing_df$w_testing <- W$weights

# Refit ALT interaction model on tested patients, weighted
lm.alt.ipw <- lm(
  alt ~ GenderCode + RaceEthnicity + Cushings * Obesity,
  data    = alt_testing_df |> filter(has_alt),
  weights = w_testing
)

tidy(lm.alt.ipw, conf.int = TRUE) |>
  filter(term == "Cushings:ObesityObese")

Add a Table 4 row: "IPW for testing probability" for ALT, AST, and HbA1c.

C. Repeated-measures model with all pre-op values (Reviewer's Option 2)

Extract all pre-op ALT/AST values within 365 days — not just the most recent one — and fit a mixed model.

library(lme4)
library(lmerTest)

# Long-format ALT: multiple pre-op values per patient where available
alt_long <- lab_results_all |>
  filter(lab_name == "ALT",
         lab_date <= cushings_procedure,
         as.numeric(cushings_procedure - lab_date) <= 365) |>
  mutate(days_to_sx = as.numeric(cushings_procedure - lab_date))

# Mixed model with random intercept per patient
lmm.alt <- lmer(
  value ~ GenderCode + RaceEthnicity + Cushings * Obesity +
          days_to_sx + (1 | DeID_PatientID),
  data = alt_long
)

summary(lmm.alt)

Add a Table 4 row: "Repeated-measures (all pre-op values)" for ALT and AST.

Also useful: report n (patients) and k (observations) for this row, plus the within-patient variance component, to characterize how much information is actually being recovered by using multiple values.

D. Bayesian parallel with posterior probability (recommended given the small n)

With n=84 for ALT/AST and n=17 for HbA1c, OLS produces wide intervals that are hard to interpret cleanly. brms with weakly informative priors gives you:

  1. Stable inference under small n via regularization.
  2. Posterior probabilities that translate directly to clinical claims.
library(brms)

priors <- c(
  prior(normal(0, 10),  class = "b"),
  prior(normal(0, 50),  class = "b", coef = "Cushings:ObesityObese"),
  prior(student_t(3, 0, 30), class = "sigma"),
  prior(student_t(3, 0, 20), class = "sd")
)

fit.alt.bayes <- brm(
  value ~ GenderCode + RaceEthnicity + Cushings * Obesity +
          days_to_sx + (1 | DeID_PatientID),
  data   = alt_long,
  prior  = priors,
  family = gaussian(),
  chains = 4, cores = 4, iter = 4000
)

draws <- as_draws_df(fit.alt.bayes)
mean(draws$`b_Cushings:ObesityObese` > 20)   # P(synergistic excess > 20 mg/dL)
mean(draws$`b_Cushings:ObesityObese` > 0)    # P(any synergy)

Report: "In Bayesian repeated-measures models with weakly informative priors, the posterior probability of any obesity × Cushing's synergy on ALT is X%, with Y% probability the synergistic excess exceeds 20 mg/dL above additivity."

E. Manuscript text

Methods (new paragraph):

"Because laboratory availability in EHRs may reflect selective testing, we characterized testing prevalence by Cushing's × obesity stratum (Supplementary Table S[N]) and fit two sensitivity analyses: (i) inverse-probability-of-testing weighted outcome models and (ii) mixed-effects models using all pre-operative values within 365 days before surgery, adjusting for time from lab draw to surgery."

Limitations (revised):

"ALT and AST were available for 84 of 365 Cushing's patients and HbA1c for only 17, raising the possibility of selective testing bias. Testing prevalence was [similar / higher in obese Cushing's / higher in lean Cushing's] across strata (Supplementary Table S[N]). The obesity × Cushing's interaction for transaminases [persisted / was attenuated] in inverse-probability-weighted and repeated-measures sensitivity analyses (Table 4), [supporting / qualifying] the robustness of the primary estimate. The small HbA1c subset limits interpretation for that outcome regardless of analytic approach."

Conclusion (soften if the repeated-measures or IPW estimate attenuates substantially):

Replace "This study provides novel data that obesity and Cushing's disease interact, suggesting that obesity and Cushing's disease may result in worsened liver damage"

with: "This study provides novel data that obesity and Cushing's disease interact to produce higher preoperative transaminase levels, though single-value sampling and incomplete laboratory coverage qualify the strength of causal inference."

Apply softening only if the sensitivity analyses don't support the primary result. If they do, keep the stronger language and add a pointer: "This conclusion is supported by sensitivity analyses accounting for selective testing and temporal variability (Table 4)."

Acceptance criteria

  • Supplementary table: testing prevalence by Cushing's × obesity stratum for ALT, AST, and HbA1c.
  • Table 4: new row "IPW for testing probability" for ALT, AST, HbA1c interaction estimates.
  • Table 4: new row "Repeated-measures (all pre-op values)" for ALT and AST.
  • Methods paragraph describing both sensitivity approaches.
  • Limitations paragraph updated.
  • Discussion conclusion revised only if sensitivity analyses attenuate the primary finding.
  • Optional but recommended given small n: Bayesian repeated-measures model with posterior probability of meaningful synergy reported alongside frequentist estimates.
  • If all pre-op ALT/AST values yield substantially more observations than just the most recent, document the k-to-n ratio in the response letter — it's a direct answer to "you only used one value per patient."

Notes / open questions

  • How many patients have multiple pre-op ALT/AST values in the 365-day window? This determines how much Option 2 actually does. If most patients only have one value anyway, the repeated-measures model collapses back toward the primary analysis.
  • Day-of-draw-to-surgery distribution. Worth reporting descriptively — if most draws are within 30 days of surgery (routine pre-op panels), the surveillance-bias concern is weaker than if draws are spread across the full year.
  • HbA1c n=17 is a separate problem. No sensitivity analysis will rescue a sample that small. For the HbA1c result, the cleanest response is to acknowledge the limitation prominently and report the Bayesian posterior as the headline rather than the frequentist p-value, which is more honest about what can and cannot be concluded.

References

  1. Baseline liver enzyme differences in obesity — already absorbed by the obesity main effect; cite only if the testing prevalence table shows no differential sampling.
  2. Ditto.
  3. ALT/AST temporal variability — cite in support of the repeated-measures approach, not against the primary analysis.

(Locate full cites before response letter.)

Metadata

Metadata

Assignees

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions