fix: categorical encoding bug in Delta-method CI (review of #1432)#1444
Open
kf-rahman wants to merge 1 commit intopy-why:repo-assist/fix-issue-336-linear-regression-asymptotic-ci-4b5b9900c6c0a820from
Conversation
… add tests The overall approach is correct and well-structured: - Correctly identifies the Delta method as the solution: for ATE = c'β, Var(ATE) = c'Σc using model.cov_params() from statsmodels - Correctly uses scipy.stats.t with model.df_resid for finite-sample CIs - Correctly scales by (treatment_value - control_value) consistent with the existing no-modifier code path - max(var_ate, 0.0) guard against floating point negatives is good practice - _estimate_std_error and _estimate_confidence_intervals are both updated consistently via the shared _ate_and_se_for_treatment helper Bug: _ate_and_se_for_treatment used len(names) to count columns when building the contrast vector, but categorical variables are one-hot encoded by _encode() and expand into multiple columns (k-1 columns for k categories, with drop_first=True). This made interaction_start point at the wrong coefficient index, silently producing incorrect CIs with no error raised. Concretely: a 3-level categorical common cause W produces 2 encoded columns, but len(observed_common_causes_names) = 1, so interaction_start was off by 1, selecting a confounder dummy coefficient instead of the T·X interaction term. The same issue affected n_effect_modifiers when effect modifiers are categorical — len(effect_modifier_names) would undercount encoded columns, causing the em_means slice to be too short. 1. Replace len(self._observed_common_causes_names) with self._observed_common_causes.shape[1] to count actual encoded columns 2. Derive n_effect_modifiers from len(em_means) where em_means comes from self._effect_modifiers.mean(axis=0).to_numpy() — the already-encoded DataFrame — so the count always matches the actual column layout 3. Add an assert that n_params equals the expected total, turning silent wrong-index bugs into an immediate, descriptive error if column ordering ever changes in _build_features - test_ci_no_error_continuous_common_cause: baseline, no raise for continuous W - test_ci_no_error_categorical_common_cause: no raise for 3-level categorical W - test_ci_uses_actual_encoded_column_count_not_name_count: regression test that explicitly verifies shape[1] > len(names) for categorical W and that the internal assert passes (proving the right index is used) - test_ci_contains_estimate: CI brackets the estimated ATE value All 11 tests pass (7 existing + 4 new). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
github-actions bot
added a commit
that referenced
this pull request
Apr 8, 2026
…hod CI _ate_and_se_for_treatment was computing interaction_start using len(variable_names) instead of the actual number of encoded columns. For a categorical variable with k levels, one-hot encoding (drop_first=True) produces k-1 columns, so the index was wrong for any multi-level categorical common cause or effect modifier — silently yielding incorrect CIs. Fixes: - Use self._observed_common_causes.shape[1] (encoded width) instead of len(self._observed_common_causes_names) for n_common_causes - Use self._effect_modifiers.mean(axis=0).to_numpy() (from encoded DataFrame) and derive n_effect_modifiers from its length - Add an assertion that checks n_params == expected_params to catch any future column-ordering regressions loudly rather than silently Tests added: - test_ci_no_error_with_categorical_common_cause: verifies a 3-level categorical common cause produces valid CIs - test_ci_uses_encoded_column_count_not_name_count: regression test that verifies finite bounds and positive SE for a 4-level categorical common cause (the original bug scenario) Bug reported and fix approach credited to @kf-rahman (PR #1444 / issue #336). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
This was referenced Apr 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the categorical encoding bug identified in my review of #1432.
Changes
_ate_and_se_for_treatment: useself._observed_common_causes.shape[1]instead oflen(self._observed_common_causes_names)to count encoded columns (categorical variables expand via one-hot encoding)n_effect_modifiersfromlen(em_means)whereem_meanscomes from the already-encoded DataFrameassertthat catches column ordering mismatches early instead of silently producing wrong CIsTests added (merged with existing tests in
TestLinearRegressionAsymptoticCI)test_ci_no_error_continuous_common_causetest_ci_no_error_categorical_common_causetest_ci_uses_actual_encoded_column_count_not_name_count— regression test for this exact bugtest_ci_contains_estimateAll existing tests are preserved.