[Repo Assist] fix: pass encoded effect modifiers to EconML when categorical variables present#1435
Draft
github-actions[bot] wants to merge 1 commit intomainfrom
Conversation
…es present
When effect modifiers include categorical columns they are one-hot encoded
internally by _set_effect_modifiers() (via _encode / OneHotEncoder). The
resulting DataFrame has new column names such as 'x7_0' / 'x7_1', but
self._effect_modifier_names still held the original name ('x7').
Four call sites then did df[self._effect_modifier_names] on the already-
encoded DataFrame, raising KeyError for any categorical effect modifier:
- effect() – called from estimate_effect() with X_test
- effect_interval() – same
- effect_inference() – same
- effect_tt() – called externally with the full raw DataFrame
- shap_values() – called externally with the full raw DataFrame
- estimate_effect() – when target_units is a DataFrame
Fix:
- effect / effect_interval / effect_inference: df is already the encoded
effect-modifier sub-frame passed in from estimate_effect(); drop the
redundant column-selection so apply_multitreatment receives it directly.
- effect_tt / shap_values: df is the full raw dataset; select the effect-
modifier columns and then encode them with the stored encoder before
use.
- estimate_effect (target_units DataFrame path): encode the incoming
DataFrame via the stored encoder so the EconML estimator sees data in
the same feature space it was trained on.
Closes #820
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
25 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 This is an automated PR from Repo Assist, an AI assistant.
Closes #820.
Root Cause
When effect modifiers include categorical columns (dtype
"category"),_set_effect_modifiers()one-hot encodes them via_encode()/OneHotEncoder. The resultingself._effect_modifiersDataFrame has new column names such asx7_0/x7_1, butself._effect_modifier_namesstill held the original name'x7'.Four call sites then did
df[self._effect_modifier_names]on the already-encoded DataFrame or on the full dataset and then expected to pass the result to the fitted EconML model:effect()dfis already the encoded EM frame fromestimate_effect(); column selection with original names raisesKeyErroreffect_interval()effect_inference()effect_tt()shap_values()effect_tt()estimate_effect()(target_units=DataFrame path)X_testFix
effect/effect_interval/effect_inference:dfentering these methods is already the encoded effect-modifier sub-frame assembled byestimate_effect(). Remove the redundant (and broken) column selection; passdfdirectly toapply_multitreatment.effect_tt/shap_values:dfis the full raw dataset. Select the effect-modifier columns and then encode them with the stored encoder so the EconML model sees data in the same feature space it was trained on.estimate_effect(target_units=DataFrame path): encode the incoming DataFrame via the stored encoder before using it asX_test.Changes
dowhy/causal_estimators/econml.py— six targeted call-site fixestests/causal_estimators/test_econml_estimator.py— regression testtest_categorical_effect_modifiersusing a categorical effect modifier withbackdoor.econml.dml.DMLTest Status
black --checkpasses (reformatted one file)isort --checkpasses_encode()call already used infit()Note
🔒 Integrity filter blocked 90 items
The following items were blocked because they don't meet the GitHub integrity level.
list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".To allow these resources, lower
min-integrityin your GitHub frontmatter: