Skip to content

[Repo Assist] fix: compute proper chi-squared p-value for CMI independence test in GraphRefuter#1431

Open
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-cmi-pvalue-413-85077bb53e7594ea
Open

[Repo Assist] fix: compute proper chi-squared p-value for CMI independence test in GraphRefuter#1431
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-cmi-pvalue-413-85077bb53e7594ea

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This PR was created by Repo Assist, an automated AI assistant.

Summary

Fixes a statistical correctness bug in the conditional_mutual_information() method of GraphRefuter where raw conditional mutual information (in bits) was being compared against 0.05 as if it were a p-value.

Closes #413

Root Cause

conditional_MI() returns an information-theoretic quantity (CMI in bits), not a probability. The old code:

cmi_val = conditional_MI(data=self._data, x=x, y=y, z=list(z))
if cmi_val <= 0.05:  # wrong: entropy bits ≠ p-value
    self._true_implications.append([x, y, z])

CMI (bits) is always ≥ 0 and can be arbitrarily large. Comparing it to 0.05 has no statistical meaning and gives unreliable independence test results.

Fix

Apply the G-test (log-likelihood ratio test for independence). Under the null hypothesis of conditional independence, the test statistic G = 2N · CMI_nats is asymptotically chi-squared with degrees of freedom (|X|−1)(|Y|−1) · |Z_combos|. Converting CMI from bits to nats and computing the chi-squared survival function yields a proper p-value:

g_stat = 2 * n * cmi_bits * log(2)        # convert to G-test statistic
df = max(1, (x_card - 1) * (y_card - 1) * z_card)
p_value = float(chi2.sf(g_stat, df=df))

if p_value >= 0.05:   # same semantics as partial_correlation
    self._true_implications.append([x, y, z])
```

This makes `conditional_mutual_information` consistent with `partial_correlation`, which already correctly uses a p-value threshold.

## Trade-offs

- The fix uses `scipy.stats.chi2` which is already a project dependency (used elsewhere in `cit.py`).
- The asymptotic chi-squared approximation is valid for reasonably large samples. For very small samples (n < ~30 per cell), a permutation test would be more exact, but that's a separate enhancement.

## Test Status

All existing graph refutation tests pass:

```
tests/test_causal_model.py::TestCausalModel::test_graph_refutation[5-5000] PASSED
tests/test_causal_model.py::TestCausalModel::test_graph_refutation2[10-5000] PASSED
2 passed in 12.57s

Note

🔒 Integrity filtering filtered 111 items

Integrity filtering activated and filtered the following items during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.

Generated by Repo Assist ·

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@b897c2f3e43bde9ff7923c8fa9211055b26e27cc

The conditional_mutual_information() method in GraphRefuter was comparing
raw conditional mutual information (in bits) against 0.05 as if it were
a p-value. This is statistically incorrect: CMI is an information-theoretic
quantity, not a probability.

Fix: apply the G-test (likelihood ratio test). Under the null hypothesis
of conditional independence, the test statistic G = 2 * N * CMI_nats is
asymptotically chi-squared with df = (|X|-1)(|Y|-1)*|Z_combos| degrees
of freedom. Convert CMI from bits to nats (multiply by ln 2), compute
the chi-squared survival function to get a proper p-value, then compare
against α = 0.05 — consistent with partial_correlation's approach.

Closes #413

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes statistical correctness in GraphRefuter.conditional_mutual_information() by converting conditional mutual information (CMI, in bits) into a chi-squared (G-test) p-value before applying the 0.05 threshold, aligning semantics with partial_correlation().

Changes:

  • Add chi-squared survival-function calculation (scipy.stats.chi2.sf) to compute a p-value from CMI via the G-test statistic.
  • Replace the incorrect cmi_val <= 0.05 comparison with a p-value-based decision rule.
  • Store p-values (instead of raw CMI) in self._results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 60 to 62
def conditional_mutual_information(self, x=None, y=None, z=None):
cmi_val = conditional_MI(data=self._data, x=x, y=y, z=list(z))
cmi_bits = conditional_MI(data=self._data, x=x, y=y, z=list(z))
key = (x, y) + (z,)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conditional_MI expects x and y to be list-like column name containers (it indexes with data[list(x)]). Passing x/y as strings will be treated as an iterable of characters (e.g., 'income' -> ['i','n',...]) and can cause incorrect column selection or KeyErrors. Wrap singleton variables as [x]/[y] (and similarly ensure z is a list of column names) before calling conditional_MI.

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +72
n = len(self._data)
# Convert CMI (bits) to G-test statistic (asymptotically chi-squared under H0)
g_stat = 2 * n * cmi_bits * log(2)

# Degrees of freedom: (|X| - 1)(|Y| - 1) * number of distinct Z combinations
x_card = self._data[x].nunique()
y_card = self._data[y].nunique()
z_card = self._data[list(z)].drop_duplicates().shape[0] if z else 1
df = max(1, (x_card - 1) * (y_card - 1) * z_card)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The degrees-of-freedom calculation uses cardinalities from the raw dataframe (nunique() / drop_duplicates()), but conditional_MI internally casts variables to int before computing CMI. For non-int columns (e.g., continuous or discretized-on-the-fly cases), this can make g_stat reflect the int-cast data while df reflects the original data, producing invalid p-values. Compute x_card, y_card, and z_card from the same transformed/discretized data that is used to compute CMI (or refactor to compute both CMI and df from a shared contingency table).

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +74
df = max(1, (x_card - 1) * (y_card - 1) * z_card)

p_value = float(chi2.sf(g_stat, df=df))
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using df = max(1, ...) masks degenerate cases where x_card <= 1 or y_card <= 1 (or z_card == 0), for which the chi-squared approximation isn’t meaningful. Consider explicitly handling constant variables/empty strata (e.g., treat as non-rejection with p_value=1.0, or return NotImplemented) rather than forcing df=1.

Suggested change
df = max(1, (x_card - 1) * (y_card - 1) * z_card)
p_value = float(chi2.sf(g_stat, df=df))
df = (x_card - 1) * (y_card - 1) * z_card
if x_card <= 1 or y_card <= 1 or df <= 0:
# Degenerate contingency structure: the chi-squared approximation is not meaningful.
# Treat this as a non-rejection instead of forcing df=1.
p_value = 1.0
else:
p_value = float(chi2.sf(g_stat, df=df))

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +81
n = len(self._data)
# Convert CMI (bits) to G-test statistic (asymptotically chi-squared under H0)
g_stat = 2 * n * cmi_bits * log(2)

# Degrees of freedom: (|X| - 1)(|Y| - 1) * number of distinct Z combinations
x_card = self._data[x].nunique()
y_card = self._data[y].nunique()
z_card = self._data[list(z)].drop_duplicates().shape[0] if z else 1
df = max(1, (x_card - 1) * (y_card - 1) * z_card)

p_value = float(chi2.sf(g_stat, df=df))

if p_value >= 0.05:
self._true_implications.append([x, y, z])
self._results[key] = [cmi_val, True]
self._results[key] = [p_value, True]
else:
self._false_implications.append([x, y, z])
self._results[key] = [cmi_val, False]
self._results[key] = [p_value, False]
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change alters the statistical decision rule by introducing a chi-squared p-value computation, but existing tests only assert overall pass/fail outcomes for random graphs. Adding a focused unit test that checks (1) an (approximately) independent discrete pair yields p_value >= 0.05 and (2) a dependent pair yields p_value < 0.05 would prevent regressions in the p-value computation (including df and unit conversions).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automation bug Something isn't working repo-assist

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Conditional independence test on mutual information is not a statistical test of independence

1 participant