[Repo Assist] fix: pass encoded effect modifiers to EconML when categorical variables present by github-actions[bot] · Pull Request #1435 · py-why/dowhy

github-actions · 2026-04-02T11:25:44Z

🤖 This is an automated PR from Repo Assist, an AI assistant.

Closes #820.

Root Cause

When effect modifiers include categorical columns (dtype "category"), _set_effect_modifiers() one-hot encodes them via _encode() / OneHotEncoder. The resulting self._effect_modifiers DataFrame has new column names such as x7_0 / x7_1, but self._effect_modifier_names still held the original name 'x7'.

Four call sites then did df[self._effect_modifier_names] on the already-encoded DataFrame or on the full dataset and then expected to pass the result to the fitted EconML model:

Method	Problem
`effect()`	`df` is already the encoded EM frame from `estimate_effect()`; column selection with original names raises `KeyError`
`effect_interval()`	same
`effect_inference()`	same
`effect_tt()`	selects raw EM cols from full dataset but does not encode before passing to fitted EconML model
`shap_values()`	same as `effect_tt()`
`estimate_effect()` (target_units=DataFrame path)	raw user DataFrame not encoded before being passed as `X_test`

Fix

effect / effect_interval / effect_inference: df entering these methods is already the encoded effect-modifier sub-frame assembled by estimate_effect(). Remove the redundant (and broken) column selection; pass df directly to apply_multitreatment.
effect_tt / shap_values: df is the full raw dataset. Select the effect-modifier columns and then encode them with the stored encoder so the EconML model sees data in the same feature space it was trained on.
estimate_effect (target_units=DataFrame path): encode the incoming DataFrame via the stored encoder before using it as X_test.

Changes

dowhy/causal_estimators/econml.py — six targeted call-site fixes
tests/causal_estimators/test_econml_estimator.py — regression test test_categorical_effect_modifiers using a categorical effect modifier with backdoor.econml.dml.DML

Test Status

✅ black --check passes (reformatted one file)
✅ isort --check passes
✅ All flake8 errors are pre-existing long-line docstrings and unused variables in existing tests; no new errors introduced by this PR
ℹ️ Full test suite with EconML installed could not be run in this CI environment; however the logic change is a straightforward removal of redundant column selection and addition of the same _encode() call already used in fit()

Note

🔒 Integrity filter blocked 90 items

The following items were blocked because they don't meet the GitHub integrity level.

Issue with distance_metric_params for dowhy.causal_estimators.distance_matching_estimator #1390 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
data subset refuter bug when dataframe has categorical columns #1372 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
Error in the tutorial of "Finding optimal adjustment sets" #1360 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
Error in do() operator for interventions outside original treatment distribution #1344 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
PC: background_knowledge not strictly enforced (tiers & forbidden_edges); results vary by uc_rule / uc_priority #1343 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
Issue with Logistic Regression in Mediation Analysis #1335 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
Interventional samples gives inconsistent results. #1307 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
falsify_graph uses np.math which does not exist #1268 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
Counterfactual Samples giving invalid values of effect in dowhy gcm #1241 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
Error with Do operator. #1240 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
model.estimate_effect and model.refute_astimate throws 'A column-vector y was passed ...' error #1212 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
No Backdoor Path Available #1188 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
numpy.dual is dropped but it still occurs in dowhy #1181 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
Inconsistency in the placebo_treatment_refuter when using estimate_effect of IV #1180 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
Causal Graph not provided. DoWhy will construct a graph based on data inputs. #1125 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
numpy has no attribute 'long' #1052 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
... and 74 more items

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Repo Assist · ◷

To install this agentic workflow, run
gh aw add githubnext/agentics/workflows/repo-assist.md@b897c2f3e43bde9ff7923c8fa9211055b26e27cc

…es present When effect modifiers include categorical columns they are one-hot encoded internally by _set_effect_modifiers() (via _encode / OneHotEncoder). The resulting DataFrame has new column names such as 'x7_0' / 'x7_1', but self._effect_modifier_names still held the original name ('x7'). Four call sites then did df[self._effect_modifier_names] on the already- encoded DataFrame, raising KeyError for any categorical effect modifier: - effect() – called from estimate_effect() with X_test - effect_interval() – same - effect_inference() – same - effect_tt() – called externally with the full raw DataFrame - shap_values() – called externally with the full raw DataFrame - estimate_effect() – when target_units is a DataFrame Fix: - effect / effect_interval / effect_inference: df is already the encoded effect-modifier sub-frame passed in from estimate_effect(); drop the redundant column-selection so apply_multitreatment receives it directly. - effect_tt / shap_values: df is the full raw dataset; select the effect- modifier columns and then encode them with the stored encoder before use. - estimate_effect (target_units DataFrame path): encode the incoming DataFrame via the stored encoder so the EconML estimator sees data in the same feature space it was trained on. Closes #820 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

github-actions bot added automation bug Something isn't working repo-assist labels Apr 2, 2026

github-actions bot mentioned this pull request Apr 3, 2026

[Repo Assist] Monthly Activity 2026-04 #1433

Open

25 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Repo Assist] fix: pass encoded effect modifiers to EconML when categorical variables present#1435

[Repo Assist] fix: pass encoded effect modifiers to EconML when categorical variables present#1435
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-issue-820-econml-categorical-effect-modifiers-a54bf192bc7aad0c

github-actions bot commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

github-actions bot commented Apr 2, 2026

Root Cause

Fix

Changes

Test Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants