Skip to content

[Repo Assist] fix: pass encoded effect modifiers to EconML when categorical variables present#1435

Draft
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-issue-820-econml-categorical-effect-modifiers-a54bf192bc7aad0c
Draft

[Repo Assist] fix: pass encoded effect modifiers to EconML when categorical variables present#1435
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-issue-820-econml-categorical-effect-modifiers-a54bf192bc7aad0c

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot commented Apr 2, 2026

🤖 This is an automated PR from Repo Assist, an AI assistant.

Closes #820.

Root Cause

When effect modifiers include categorical columns (dtype "category"), _set_effect_modifiers() one-hot encodes them via _encode() / OneHotEncoder. The resulting self._effect_modifiers DataFrame has new column names such as x7_0 / x7_1, but self._effect_modifier_names still held the original name 'x7'.

Four call sites then did df[self._effect_modifier_names] on the already-encoded DataFrame or on the full dataset and then expected to pass the result to the fitted EconML model:

Method Problem
effect() df is already the encoded EM frame from estimate_effect(); column selection with original names raises KeyError
effect_interval() same
effect_inference() same
effect_tt() selects raw EM cols from full dataset but does not encode before passing to fitted EconML model
shap_values() same as effect_tt()
estimate_effect() (target_units=DataFrame path) raw user DataFrame not encoded before being passed as X_test

Fix

  • effect / effect_interval / effect_inference: df entering these methods is already the encoded effect-modifier sub-frame assembled by estimate_effect(). Remove the redundant (and broken) column selection; pass df directly to apply_multitreatment.
  • effect_tt / shap_values: df is the full raw dataset. Select the effect-modifier columns and then encode them with the stored encoder so the EconML model sees data in the same feature space it was trained on.
  • estimate_effect (target_units=DataFrame path): encode the incoming DataFrame via the stored encoder before using it as X_test.

Changes

  • dowhy/causal_estimators/econml.py — six targeted call-site fixes
  • tests/causal_estimators/test_econml_estimator.py — regression test test_categorical_effect_modifiers using a categorical effect modifier with backdoor.econml.dml.DML

Test Status

  • black --check passes (reformatted one file)
  • isort --check passes
  • ✅ All flake8 errors are pre-existing long-line docstrings and unused variables in existing tests; no new errors introduced by this PR
  • ℹ️ Full test suite with EconML installed could not be run in this CI environment; however the logic change is a straightforward removal of redundant column selection and addition of the same _encode() call already used in fit()

Note

🔒 Integrity filter blocked 90 items

The following items were blocked because they don't meet the GitHub integrity level.

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Repo Assist ·

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@b897c2f3e43bde9ff7923c8fa9211055b26e27cc

…es present

When effect modifiers include categorical columns they are one-hot encoded
internally by _set_effect_modifiers() (via _encode / OneHotEncoder). The
resulting DataFrame has new column names such as 'x7_0' / 'x7_1', but
self._effect_modifier_names still held the original name ('x7').

Four call sites then did df[self._effect_modifier_names] on the already-
encoded DataFrame, raising KeyError for any categorical effect modifier:

  - effect()          – called from estimate_effect() with X_test
  - effect_interval() – same
  - effect_inference() – same
  - effect_tt()       – called externally with the full raw DataFrame
  - shap_values()     – called externally with the full raw DataFrame
  - estimate_effect() – when target_units is a DataFrame

Fix:
- effect / effect_interval / effect_inference: df is already the encoded
  effect-modifier sub-frame passed in from estimate_effect(); drop the
  redundant column-selection so apply_multitreatment receives it directly.
- effect_tt / shap_values: df is the full raw dataset; select the effect-
  modifier columns and then encode them with the stored encoder before
  use.
- estimate_effect (target_units DataFrame path): encode the incoming
  DataFrame via the stored encoder so the EconML estimator sees data in
  the same feature space it was trained on.

Closes #820

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@github-actions github-actions bot added automation bug Something isn't working repo-assist labels Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automation bug Something isn't working repo-assist

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Estimate Effect fails with Econml DML estimator

0 participants