fix: categorical encoding bug in Delta-method CI (review of #1432) by kf-rahman · Pull Request #1444 · py-why/dowhy

kf-rahman · 2026-04-07T23:50:29Z

Fixes the categorical encoding bug identified in my review of #1432.

Changes

_ate_and_se_for_treatment: use self._observed_common_causes.shape[1] instead of len(self._observed_common_causes_names) to count encoded columns (categorical variables expand via one-hot encoding)
Derive n_effect_modifiers from len(em_means) where em_means comes from the already-encoded DataFrame
Add an assert that catches column ordering mismatches early instead of silently producing wrong CIs

Tests added (merged with existing tests in `TestLinearRegressionAsymptoticCI`)

test_ci_no_error_continuous_common_cause
test_ci_no_error_categorical_common_cause
test_ci_uses_actual_encoded_column_count_not_name_count — regression test for this exact bug
test_ci_contains_estimate

All existing tests are preserved.

… add tests The overall approach is correct and well-structured: - Correctly identifies the Delta method as the solution: for ATE = c'β, Var(ATE) = c'Σc using model.cov_params() from statsmodels - Correctly uses scipy.stats.t with model.df_resid for finite-sample CIs - Correctly scales by (treatment_value - control_value) consistent with the existing no-modifier code path - max(var_ate, 0.0) guard against floating point negatives is good practice - _estimate_std_error and _estimate_confidence_intervals are both updated consistently via the shared _ate_and_se_for_treatment helper Bug: _ate_and_se_for_treatment used len(names) to count columns when building the contrast vector, but categorical variables are one-hot encoded by _encode() and expand into multiple columns (k-1 columns for k categories, with drop_first=True). This made interaction_start point at the wrong coefficient index, silently producing incorrect CIs with no error raised. Concretely: a 3-level categorical common cause W produces 2 encoded columns, but len(observed_common_causes_names) = 1, so interaction_start was off by 1, selecting a confounder dummy coefficient instead of the T·X interaction term. The same issue affected n_effect_modifiers when effect modifiers are categorical — len(effect_modifier_names) would undercount encoded columns, causing the em_means slice to be too short. 1. Replace len(self._observed_common_causes_names) with self._observed_common_causes.shape[1] to count actual encoded columns 2. Derive n_effect_modifiers from len(em_means) where em_means comes from self._effect_modifiers.mean(axis=0).to_numpy() — the already-encoded DataFrame — so the count always matches the actual column layout 3. Add an assert that n_params equals the expected total, turning silent wrong-index bugs into an immediate, descriptive error if column ordering ever changes in _build_features - test_ci_no_error_continuous_common_cause: baseline, no raise for continuous W - test_ci_no_error_categorical_common_cause: no raise for 3-level categorical W - test_ci_uses_actual_encoded_column_count_not_name_count: regression test that explicitly verifies shape[1] > len(names) for categorical W and that the internal assert passes (proving the right index is used) - test_ci_contains_estimate: CI brackets the estimated ATE value All 11 tests pass (7 existing + 4 new). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@kf-rahman

…hod CI _ate_and_se_for_treatment was computing interaction_start using len(variable_names) instead of the actual number of encoded columns. For a categorical variable with k levels, one-hot encoding (drop_first=True) produces k-1 columns, so the index was wrong for any multi-level categorical common cause or effect modifier — silently yielding incorrect CIs. Fixes: - Use self._observed_common_causes.shape[1] (encoded width) instead of len(self._observed_common_causes_names) for n_common_causes - Use self._effect_modifiers.mean(axis=0).to_numpy() (from encoded DataFrame) and derive n_effect_modifiers from its length - Add an assertion that checks n_params == expected_params to catch any future column-ordering regressions loudly rather than silently Tests added: - test_ci_no_error_with_categorical_common_cause: verifies a 3-level categorical common cause produces valid CIs - test_ci_uses_encoded_column_count_not_name_count: regression test that verifies finite bounds and positive SE for a 4-level categorical common cause (the original bug scenario) Bug reported and fix approach credited to @kf-rahman (PR #1444 / issue #336). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

kf-rahman mentioned this pull request Apr 7, 2026

[Repo Assist] fix: implement asymptotic CI/SE via Delta method for LinearRegressionEstimator with effect modifiers #1432

Draft

This was referenced Apr 8, 2026

[Repo Assist] Monthly Activity 2026-04 #1433

Open

[Repo Assist] fix: raise ValueError for unsupported identifier_method in covariate adjustment estimators #1442

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: categorical encoding bug in Delta-method CI (review of #1432)#1444

fix: categorical encoding bug in Delta-method CI (review of #1432)#1444
kf-rahman wants to merge 1 commit intopy-why:repo-assist/fix-issue-336-linear-regression-asymptotic-ci-4b5b9900c6c0a820from
kf-rahman:pr-1432-upstream

kf-rahman commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kf-rahman commented Apr 7, 2026

Changes

Tests added (merged with existing tests in TestLinearRegressionAsymptoticCI)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Tests added (merged with existing tests in `TestLinearRegressionAsymptoticCI`)