Gap 5: Selection/surveillance bias in ALT/AST subset (n=84) and HbA1c subset (n=17) — single preoperative values may be non-randomly sampled by obesity status
Labels: reviewer-response analysis data-extraction priority-medium
Reviewer summary
Only 84 of 365 Cushing's patients had pre-operative ALT/AST values within the one-year window. Reviewer's concern: if liver tests were selectively ordered when clinically suspected, the 84 patients with values are enriched for abnormal liver function, and if that enrichment differs by obesity stratum, the observed synergy could be partly artifactual. Reviewer proposes either (Option 1) IPW for testing probability, or (Option 2) a repeated-measures model using all pre-op values within 365 days.
My critique of the critique
Legitimate:
- Surveillance bias in EHR lab availability is real. 23% coverage of ALT/AST in the Cushing's cohort is a flag.
- Option 2's repeated-measures framing is the strongest response — it uses more of the available data and handles within-person variability properly.
- The same concern applies even more acutely to the HbA1c subset (n=17), which the reviewer didn't flag.
Overstated or misdirected:
- Reviewer's refs 1, 2 ("obese patients have different baseline liver enzyme profiles") don't support the argument as made. That baseline difference is already absorbed by the obesity main effect in the matched control comparison — it's the obesity-by-Cushing's interaction that would have to differ, which requires the sampling bias to itself differ by stratum. That's an empirical question answered by looking at testing prevalence, not cited literature.
- "ALT/AST temporal variability" (ref 3) is true but is actually an argument for the repeated-measures approach, not a criticism of the current analysis.
Direction of bias is not obvious and worth saying explicitly:
- Scenario A (reviewer's implied concern): obese patients get more labs ordered because clinicians anticipate NAFLD, so the obese-Cushing's cell is enriched for patients with abnormal values → interaction inflated.
- Scenario B (equally plausible): lean Cushing's patients trigger more extensive workup because Cushing's in a lean person is unusual and prompts a broader differential, so the lean-Cushing's cell is enriched → interaction dampened and true synergy is even larger than reported.
- Scenario C (routine pre-op labs): most pre-surgical ALT/AST values are part of standard metabolic panels ordered for anesthesia clearance, not driven by suspected liver pathology. In this case, sampling is closer to random and bias is minimal.
The prevalence-of-testing analysis below tells you which scenario applies.
What needs to change
A. Prevalence of testing by stratum (cheap, do first)
For each outcome, report the fraction of the matched cohort with a usable pre-op value, by Cushings × obesity stratum:
library(dplyr)
testing_prevalence <- analysis_df |>
mutate(
stratum = paste(if_else(Cushings == 1, "Cushing's", "Control"),
if_else(BMI >= 30, "obese", "lean"))
) |>
group_by(stratum) |>
summarise(
n_total = n(),
n_with_alt = sum(!is.na(alt)),
n_with_ast = sum(!is.na(ast)),
n_with_hba1c = sum(!is.na(hba1c)),
pct_alt = mean(!is.na(alt)) * 100,
pct_ast = mean(!is.na(ast)) * 100,
pct_hba1c = mean(!is.na(hba1c)) * 100
)
This goes in the supplement as a new table. If pct_alt is similar between obese and lean Cushing's cases (say within 10 percentage points), differential surveillance bias for the interaction term is limited and the reviewer's concern is largely defused without further modeling. If it differs substantially, the IPW analysis below does real work.
B. IPW for testing probability (Reviewer's Option 1)
Model the probability of having a pre-op ALT/AST value, derive stabilized weights, refit the outcome model weighted.
library(WeightIt)
library(broom)
# ALT/AST testing probability model
alt_testing_df <- analysis_df |>
mutate(
has_alt = !is.na(alt),
admit_year = year(DeID_AdmitDate),
days_to_sx = as.numeric(cushings_procedure - DeID_AdmitDate)
)
W <- weightit(
has_alt ~ Cushings * Obesity + AgeInYears + GenderCode +
RaceEthnicity + admit_year,
data = alt_testing_df,
method = "glm",
estimand = "ATE",
stabilize = TRUE
)
alt_testing_df$w_testing <- W$weights
# Refit ALT interaction model on tested patients, weighted
lm.alt.ipw <- lm(
alt ~ GenderCode + RaceEthnicity + Cushings * Obesity,
data = alt_testing_df |> filter(has_alt),
weights = w_testing
)
tidy(lm.alt.ipw, conf.int = TRUE) |>
filter(term == "Cushings:ObesityObese")
Add a Table 4 row: "IPW for testing probability" for ALT, AST, and HbA1c.
C. Repeated-measures model with all pre-op values (Reviewer's Option 2)
Extract all pre-op ALT/AST values within 365 days — not just the most recent one — and fit a mixed model.
library(lme4)
library(lmerTest)
# Long-format ALT: multiple pre-op values per patient where available
alt_long <- lab_results_all |>
filter(lab_name == "ALT",
lab_date <= cushings_procedure,
as.numeric(cushings_procedure - lab_date) <= 365) |>
mutate(days_to_sx = as.numeric(cushings_procedure - lab_date))
# Mixed model with random intercept per patient
lmm.alt <- lmer(
value ~ GenderCode + RaceEthnicity + Cushings * Obesity +
days_to_sx + (1 | DeID_PatientID),
data = alt_long
)
summary(lmm.alt)
Add a Table 4 row: "Repeated-measures (all pre-op values)" for ALT and AST.
Also useful: report n (patients) and k (observations) for this row, plus the within-patient variance component, to characterize how much information is actually being recovered by using multiple values.
D. Bayesian parallel with posterior probability (recommended given the small n)
With n=84 for ALT/AST and n=17 for HbA1c, OLS produces wide intervals that are hard to interpret cleanly. brms with weakly informative priors gives you:
- Stable inference under small n via regularization.
- Posterior probabilities that translate directly to clinical claims.
library(brms)
priors <- c(
prior(normal(0, 10), class = "b"),
prior(normal(0, 50), class = "b", coef = "Cushings:ObesityObese"),
prior(student_t(3, 0, 30), class = "sigma"),
prior(student_t(3, 0, 20), class = "sd")
)
fit.alt.bayes <- brm(
value ~ GenderCode + RaceEthnicity + Cushings * Obesity +
days_to_sx + (1 | DeID_PatientID),
data = alt_long,
prior = priors,
family = gaussian(),
chains = 4, cores = 4, iter = 4000
)
draws <- as_draws_df(fit.alt.bayes)
mean(draws$`b_Cushings:ObesityObese` > 20) # P(synergistic excess > 20 mg/dL)
mean(draws$`b_Cushings:ObesityObese` > 0) # P(any synergy)
Report: "In Bayesian repeated-measures models with weakly informative priors, the posterior probability of any obesity × Cushing's synergy on ALT is X%, with Y% probability the synergistic excess exceeds 20 mg/dL above additivity."
E. Manuscript text
Methods (new paragraph):
"Because laboratory availability in EHRs may reflect selective testing, we characterized testing prevalence by Cushing's × obesity stratum (Supplementary Table S[N]) and fit two sensitivity analyses: (i) inverse-probability-of-testing weighted outcome models and (ii) mixed-effects models using all pre-operative values within 365 days before surgery, adjusting for time from lab draw to surgery."
Limitations (revised):
"ALT and AST were available for 84 of 365 Cushing's patients and HbA1c for only 17, raising the possibility of selective testing bias. Testing prevalence was [similar / higher in obese Cushing's / higher in lean Cushing's] across strata (Supplementary Table S[N]). The obesity × Cushing's interaction for transaminases [persisted / was attenuated] in inverse-probability-weighted and repeated-measures sensitivity analyses (Table 4), [supporting / qualifying] the robustness of the primary estimate. The small HbA1c subset limits interpretation for that outcome regardless of analytic approach."
Conclusion (soften if the repeated-measures or IPW estimate attenuates substantially):
Replace "This study provides novel data that obesity and Cushing's disease interact, suggesting that obesity and Cushing's disease may result in worsened liver damage"
with: "This study provides novel data that obesity and Cushing's disease interact to produce higher preoperative transaminase levels, though single-value sampling and incomplete laboratory coverage qualify the strength of causal inference."
Apply softening only if the sensitivity analyses don't support the primary result. If they do, keep the stronger language and add a pointer: "This conclusion is supported by sensitivity analyses accounting for selective testing and temporal variability (Table 4)."
Acceptance criteria
Notes / open questions
- How many patients have multiple pre-op ALT/AST values in the 365-day window? This determines how much Option 2 actually does. If most patients only have one value anyway, the repeated-measures model collapses back toward the primary analysis.
- Day-of-draw-to-surgery distribution. Worth reporting descriptively — if most draws are within 30 days of surgery (routine pre-op panels), the surveillance-bias concern is weaker than if draws are spread across the full year.
- HbA1c n=17 is a separate problem. No sensitivity analysis will rescue a sample that small. For the HbA1c result, the cleanest response is to acknowledge the limitation prominently and report the Bayesian posterior as the headline rather than the frequentist p-value, which is more honest about what can and cannot be concluded.
References
- Baseline liver enzyme differences in obesity — already absorbed by the obesity main effect; cite only if the testing prevalence table shows no differential sampling.
- Ditto.
- ALT/AST temporal variability — cite in support of the repeated-measures approach, not against the primary analysis.
(Locate full cites before response letter.)
Gap 5: Selection/surveillance bias in ALT/AST subset (n=84) and HbA1c subset (n=17) — single preoperative values may be non-randomly sampled by obesity status
Labels:
reviewer-responseanalysisdata-extractionpriority-mediumReviewer summary
Only 84 of 365 Cushing's patients had pre-operative ALT/AST values within the one-year window. Reviewer's concern: if liver tests were selectively ordered when clinically suspected, the 84 patients with values are enriched for abnormal liver function, and if that enrichment differs by obesity stratum, the observed synergy could be partly artifactual. Reviewer proposes either (Option 1) IPW for testing probability, or (Option 2) a repeated-measures model using all pre-op values within 365 days.
My critique of the critique
Legitimate:
Overstated or misdirected:
Direction of bias is not obvious and worth saying explicitly:
The prevalence-of-testing analysis below tells you which scenario applies.
What needs to change
A. Prevalence of testing by stratum (cheap, do first)
For each outcome, report the fraction of the matched cohort with a usable pre-op value, by Cushings × obesity stratum:
This goes in the supplement as a new table. If pct_alt is similar between obese and lean Cushing's cases (say within 10 percentage points), differential surveillance bias for the interaction term is limited and the reviewer's concern is largely defused without further modeling. If it differs substantially, the IPW analysis below does real work.
B. IPW for testing probability (Reviewer's Option 1)
Model the probability of having a pre-op ALT/AST value, derive stabilized weights, refit the outcome model weighted.
Add a Table 4 row: "IPW for testing probability" for ALT, AST, and HbA1c.
C. Repeated-measures model with all pre-op values (Reviewer's Option 2)
Extract all pre-op ALT/AST values within 365 days — not just the most recent one — and fit a mixed model.
Add a Table 4 row: "Repeated-measures (all pre-op values)" for ALT and AST.
Also useful: report n (patients) and k (observations) for this row, plus the within-patient variance component, to characterize how much information is actually being recovered by using multiple values.
D. Bayesian parallel with posterior probability (recommended given the small n)
With n=84 for ALT/AST and n=17 for HbA1c, OLS produces wide intervals that are hard to interpret cleanly.
brmswith weakly informative priors gives you:Report: "In Bayesian repeated-measures models with weakly informative priors, the posterior probability of any obesity × Cushing's synergy on ALT is X%, with Y% probability the synergistic excess exceeds 20 mg/dL above additivity."
E. Manuscript text
Methods (new paragraph):
Limitations (revised):
Conclusion (soften if the repeated-measures or IPW estimate attenuates substantially):
Apply softening only if the sensitivity analyses don't support the primary result. If they do, keep the stronger language and add a pointer: "This conclusion is supported by sensitivity analyses accounting for selective testing and temporal variability (Table 4)."
Acceptance criteria
Notes / open questions
References
(Locate full cites before response letter.)