[Repo Assist] fix: compute proper chi-squared p-value for CMI independence test in GraphRefuter by github-actions[bot] · Pull Request #1431 · py-why/dowhy

github-actions · 2026-03-31T17:33:03Z

🤖 This PR was created by Repo Assist, an automated AI assistant.

Summary

Fixes a statistical correctness bug in the conditional_mutual_information() method of GraphRefuter where raw conditional mutual information (in bits) was being compared against 0.05 as if it were a p-value.

Closes #413

Root Cause

conditional_MI() returns an information-theoretic quantity (CMI in bits), not a probability. The old code:

cmi_val = conditional_MI(data=self._data, x=x, y=y, z=list(z))
if cmi_val <= 0.05:  # wrong: entropy bits ≠ p-value
    self._true_implications.append([x, y, z])

CMI (bits) is always ≥ 0 and can be arbitrarily large. Comparing it to 0.05 has no statistical meaning and gives unreliable independence test results.

Fix

Apply the G-test (log-likelihood ratio test for independence). Under the null hypothesis of conditional independence, the test statistic G = 2N · CMI_nats is asymptotically chi-squared with degrees of freedom (|X|−1)(|Y|−1) · |Z_combos|. Converting CMI from bits to nats and computing the chi-squared survival function yields a proper p-value:

g_stat = 2 * n * cmi_bits * log(2)        # convert to G-test statistic
df = max(1, (x_card - 1) * (y_card - 1) * z_card)
p_value = float(chi2.sf(g_stat, df=df))

if p_value >= 0.05:   # same semantics as partial_correlation
    self._true_implications.append([x, y, z])
```

This makes `conditional_mutual_information` consistent with `partial_correlation`, which already correctly uses a p-value threshold.

## Trade-offs

- The fix uses `scipy.stats.chi2` which is already a project dependency (used elsewhere in `cit.py`).
- The asymptotic chi-squared approximation is valid for reasonably large samples. For very small samples (n < ~30 per cell), a permutation test would be more exact, but that's a separate enhancement.

## Test Status

All existing graph refutation tests pass:

```
tests/test_causal_model.py::TestCausalModel::test_graph_refutation[5-5000] PASSED
tests/test_causal_model.py::TestCausalModel::test_graph_refutation2[10-5000] PASSED
2 passed in 12.57s

Note

🔒 Integrity filtering filtered 111 items

Integrity filtering activated and filtered the following items during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.

issue:[Repo Assist] Monthly Activity 2026-03 #1403 (issue_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:Unable to estimate causal effect with intermediary variable? #69 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:ImportError: weighting_sampler is not an existing do sampler. #71 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:DoSampler: Fails when you Add a Graph to the Causal Model #83 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:Errors of Notebook: The Causal Story Behind Hotel Booking Cancellations #198 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:Mediation Analysis key error #214 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:arrays used as indices must be of integer (or boolean) type #225 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:How the Identification Works and About Two Outcomes #240 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:How to get the control group with propensity score matching? #242 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:Mediation analysis: how to choose mediators? #249 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:Placebo refuter with multiple treatments #251 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:Clarification in identification of estimands and estimation using backdoor.linear_regression #255 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:CATE: Continuous Outcome and Binary/Categorical Variables #257 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:Couple of questions on causal_identifier.py #264 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:Clarification on implications of refutation tests #285 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
issue:DoWhy Logistic Regression with Stats Api #296 (list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
... and 95 more items

Generated by Repo Assist · ◷

To install this agentic workflow, run
gh aw add githubnext/agentics/workflows/repo-assist.md@b897c2f3e43bde9ff7923c8fa9211055b26e27cc

The conditional_mutual_information() method in GraphRefuter was comparing raw conditional mutual information (in bits) against 0.05 as if it were a p-value. This is statistically incorrect: CMI is an information-theoretic quantity, not a probability. Fix: apply the G-test (likelihood ratio test). Under the null hypothesis of conditional independence, the test statistic G = 2 * N * CMI_nats is asymptotically chi-squared with df = (|X|-1)(|Y|-1)*|Z_combos| degrees of freedom. Convert CMI from bits to nats (multiply by ln 2), compute the chi-squared survival function to get a proper p-value, then compare against α = 0.05 — consistent with partial_correlation's approach. Closes #413 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Fixes statistical correctness in GraphRefuter.conditional_mutual_information() by converting conditional mutual information (CMI, in bits) into a chi-squared (G-test) p-value before applying the 0.05 threshold, aligning semantics with partial_correlation().

Changes:

Add chi-squared survival-function calculation (scipy.stats.chi2.sf) to compute a p-value from CMI via the G-test statistic.
Replace the incorrect cmi_val <= 0.05 comparison with a p-value-based decision rule.
Store p-values (instead of raw CMI) in self._results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-06T07:47:24Z

dowhy/causal_refuters/graph_refuter.py

    def conditional_mutual_information(self, x=None, y=None, z=None):
-        cmi_val = conditional_MI(data=self._data, x=x, y=y, z=list(z))
+        cmi_bits = conditional_MI(data=self._data, x=x, y=y, z=list(z))
        key = (x, y) + (z,)


conditional_MI expects x and y to be list-like column name containers (it indexes with data[list(x)]). Passing x/y as strings will be treated as an iterable of characters (e.g., 'income' -> ['i','n',...]) and can cause incorrect column selection or KeyErrors. Wrap singleton variables as [x]/[y] (and similarly ensure z is a list of column names) before calling conditional_MI.

Copilot · 2026-04-06T07:47:24Z

dowhy/causal_refuters/graph_refuter.py

+        n = len(self._data)
+        # Convert CMI (bits) to G-test statistic (asymptotically chi-squared under H0)
+        g_stat = 2 * n * cmi_bits * log(2)
+
+        # Degrees of freedom: (|X| - 1)(|Y| - 1) * number of distinct Z combinations
+        x_card = self._data[x].nunique()
+        y_card = self._data[y].nunique()
+        z_card = self._data[list(z)].drop_duplicates().shape[0] if z else 1
+        df = max(1, (x_card - 1) * (y_card - 1) * z_card)


The degrees-of-freedom calculation uses cardinalities from the raw dataframe (nunique() / drop_duplicates()), but conditional_MI internally casts variables to int before computing CMI. For non-int columns (e.g., continuous or discretized-on-the-fly cases), this can make g_stat reflect the int-cast data while df reflects the original data, producing invalid p-values. Compute x_card, y_card, and z_card from the same transformed/discretized data that is used to compute CMI (or refactor to compute both CMI and df from a shared contingency table).

Copilot · 2026-04-06T07:47:24Z

dowhy/causal_refuters/graph_refuter.py

+        df = max(1, (x_card - 1) * (y_card - 1) * z_card)
+
+        p_value = float(chi2.sf(g_stat, df=df))


Using df = max(1, ...) masks degenerate cases where x_card <= 1 or y_card <= 1 (or z_card == 0), for which the chi-squared approximation isn’t meaningful. Consider explicitly handling constant variables/empty strata (e.g., treat as non-rejection with p_value=1.0, or return NotImplemented) rather than forcing df=1.

Suggested change

df = max(1, (x_card - 1) * (y_card - 1) * z_card)

p_value = float(chi2.sf(g_stat, df=df))

df = (x_card - 1) * (y_card - 1) * z_card

if x_card <= 1 or y_card <= 1 or df <= 0:

# Degenerate contingency structure: the chi-squared approximation is not meaningful.

# Treat this as a non-rejection instead of forcing df=1.

p_value = 1.0

else:

p_value = float(chi2.sf(g_stat, df=df))

Copilot · 2026-04-06T07:47:24Z

dowhy/causal_refuters/graph_refuter.py

+        n = len(self._data)
+        # Convert CMI (bits) to G-test statistic (asymptotically chi-squared under H0)
+        g_stat = 2 * n * cmi_bits * log(2)
+
+        # Degrees of freedom: (|X| - 1)(|Y| - 1) * number of distinct Z combinations
+        x_card = self._data[x].nunique()
+        y_card = self._data[y].nunique()
+        z_card = self._data[list(z)].drop_duplicates().shape[0] if z else 1
+        df = max(1, (x_card - 1) * (y_card - 1) * z_card)
+
+        p_value = float(chi2.sf(g_stat, df=df))
+
+        if p_value >= 0.05:
            self._true_implications.append([x, y, z])
-            self._results[key] = [cmi_val, True]
+            self._results[key] = [p_value, True]
        else:
            self._false_implications.append([x, y, z])
-            self._results[key] = [cmi_val, False]
+            self._results[key] = [p_value, False]


This change alters the statistical decision rule by introducing a chi-squared p-value computation, but existing tests only assert overall pass/fail outcomes for random graphs. Adding a focused unit test that checks (1) an (approximately) independent discrete pair yields p_value >= 0.05 and (2) a dependent pair yields p_value < 0.05 would prevent regressions in the p-value computation (including df and unit conversions).

github-actions bot added automation bug Something isn't working repo-assist labels Mar 31, 2026

emrekiciman marked this pull request as ready for review April 1, 2026 07:19

This was referenced Apr 1, 2026

[Repo Assist] Monthly Activity 2026-04 #1433

Open

Adding model validation after user specifies model #412

Open

emrekiciman requested a review from Copilot April 6, 2026 07:44

Copilot started reviewing on behalf of emrekiciman April 6, 2026 07:45 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Repo Assist] fix: compute proper chi-squared p-value for CMI independence test in GraphRefuter#1431

[Repo Assist] fix: compute proper chi-squared p-value for CMI independence test in GraphRefuter#1431
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-cmi-pvalue-413-85077bb53e7594ea

github-actions bot commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		df = max(1, (x_card - 1) * (y_card - 1) * z_card)

		p_value = float(chi2.sf(g_stat, df=df))

-        df = max(1, (x_card - 1) * (y_card - 1) * z_card)
-        p_value = float(chi2.sf(g_stat, df=df))
+        df = (x_card - 1) * (y_card - 1) * z_card
+        if x_card <= 1 or y_card <= 1 or df <= 0:
+            # Degenerate contingency structure: the chi-squared approximation is not meaningful.
+            # Treat this as a non-rejection instead of forcing df=1.
+            p_value = 1.0
+        else:
+            p_value = float(chi2.sf(g_stat, df=df))

Conversation

github-actions bot commented Mar 31, 2026

Summary

Root Cause

Fix

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant