[Repo Assist] fix: compute proper chi-squared p-value for CMI independence test in GraphRefuter#1431
Conversation
The conditional_mutual_information() method in GraphRefuter was comparing raw conditional mutual information (in bits) against 0.05 as if it were a p-value. This is statistically incorrect: CMI is an information-theoretic quantity, not a probability. Fix: apply the G-test (likelihood ratio test). Under the null hypothesis of conditional independence, the test statistic G = 2 * N * CMI_nats is asymptotically chi-squared with df = (|X|-1)(|Y|-1)*|Z_combos| degrees of freedom. Convert CMI from bits to nats (multiply by ln 2), compute the chi-squared survival function to get a proper p-value, then compare against α = 0.05 — consistent with partial_correlation's approach. Closes #413 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes statistical correctness in GraphRefuter.conditional_mutual_information() by converting conditional mutual information (CMI, in bits) into a chi-squared (G-test) p-value before applying the 0.05 threshold, aligning semantics with partial_correlation().
Changes:
- Add chi-squared survival-function calculation (
scipy.stats.chi2.sf) to compute a p-value from CMI via the G-test statistic. - Replace the incorrect
cmi_val <= 0.05comparison with a p-value-based decision rule. - Store p-values (instead of raw CMI) in
self._results.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def conditional_mutual_information(self, x=None, y=None, z=None): | ||
| cmi_val = conditional_MI(data=self._data, x=x, y=y, z=list(z)) | ||
| cmi_bits = conditional_MI(data=self._data, x=x, y=y, z=list(z)) | ||
| key = (x, y) + (z,) |
There was a problem hiding this comment.
conditional_MI expects x and y to be list-like column name containers (it indexes with data[list(x)]). Passing x/y as strings will be treated as an iterable of characters (e.g., 'income' -> ['i','n',...]) and can cause incorrect column selection or KeyErrors. Wrap singleton variables as [x]/[y] (and similarly ensure z is a list of column names) before calling conditional_MI.
| n = len(self._data) | ||
| # Convert CMI (bits) to G-test statistic (asymptotically chi-squared under H0) | ||
| g_stat = 2 * n * cmi_bits * log(2) | ||
|
|
||
| # Degrees of freedom: (|X| - 1)(|Y| - 1) * number of distinct Z combinations | ||
| x_card = self._data[x].nunique() | ||
| y_card = self._data[y].nunique() | ||
| z_card = self._data[list(z)].drop_duplicates().shape[0] if z else 1 | ||
| df = max(1, (x_card - 1) * (y_card - 1) * z_card) |
There was a problem hiding this comment.
The degrees-of-freedom calculation uses cardinalities from the raw dataframe (nunique() / drop_duplicates()), but conditional_MI internally casts variables to int before computing CMI. For non-int columns (e.g., continuous or discretized-on-the-fly cases), this can make g_stat reflect the int-cast data while df reflects the original data, producing invalid p-values. Compute x_card, y_card, and z_card from the same transformed/discretized data that is used to compute CMI (or refactor to compute both CMI and df from a shared contingency table).
| df = max(1, (x_card - 1) * (y_card - 1) * z_card) | ||
|
|
||
| p_value = float(chi2.sf(g_stat, df=df)) |
There was a problem hiding this comment.
Using df = max(1, ...) masks degenerate cases where x_card <= 1 or y_card <= 1 (or z_card == 0), for which the chi-squared approximation isn’t meaningful. Consider explicitly handling constant variables/empty strata (e.g., treat as non-rejection with p_value=1.0, or return NotImplemented) rather than forcing df=1.
| df = max(1, (x_card - 1) * (y_card - 1) * z_card) | |
| p_value = float(chi2.sf(g_stat, df=df)) | |
| df = (x_card - 1) * (y_card - 1) * z_card | |
| if x_card <= 1 or y_card <= 1 or df <= 0: | |
| # Degenerate contingency structure: the chi-squared approximation is not meaningful. | |
| # Treat this as a non-rejection instead of forcing df=1. | |
| p_value = 1.0 | |
| else: | |
| p_value = float(chi2.sf(g_stat, df=df)) |
| n = len(self._data) | ||
| # Convert CMI (bits) to G-test statistic (asymptotically chi-squared under H0) | ||
| g_stat = 2 * n * cmi_bits * log(2) | ||
|
|
||
| # Degrees of freedom: (|X| - 1)(|Y| - 1) * number of distinct Z combinations | ||
| x_card = self._data[x].nunique() | ||
| y_card = self._data[y].nunique() | ||
| z_card = self._data[list(z)].drop_duplicates().shape[0] if z else 1 | ||
| df = max(1, (x_card - 1) * (y_card - 1) * z_card) | ||
|
|
||
| p_value = float(chi2.sf(g_stat, df=df)) | ||
|
|
||
| if p_value >= 0.05: | ||
| self._true_implications.append([x, y, z]) | ||
| self._results[key] = [cmi_val, True] | ||
| self._results[key] = [p_value, True] | ||
| else: | ||
| self._false_implications.append([x, y, z]) | ||
| self._results[key] = [cmi_val, False] | ||
| self._results[key] = [p_value, False] |
There was a problem hiding this comment.
This change alters the statistical decision rule by introducing a chi-squared p-value computation, but existing tests only assert overall pass/fail outcomes for random graphs. Adding a focused unit test that checks (1) an (approximately) independent discrete pair yields p_value >= 0.05 and (2) a dependent pair yields p_value < 0.05 would prevent regressions in the p-value computation (including df and unit conversions).
🤖 This PR was created by Repo Assist, an automated AI assistant.
Summary
Fixes a statistical correctness bug in the
conditional_mutual_information()method ofGraphRefuterwhere raw conditional mutual information (in bits) was being compared against0.05as if it were a p-value.Closes #413
Root Cause
conditional_MI()returns an information-theoretic quantity (CMI in bits), not a probability. The old code:CMI (bits) is always ≥ 0 and can be arbitrarily large. Comparing it to
0.05has no statistical meaning and gives unreliable independence test results.Fix
Apply the G-test (log-likelihood ratio test for independence). Under the null hypothesis of conditional independence, the test statistic
G = 2N · CMI_natsis asymptotically chi-squared with degrees of freedom(|X|−1)(|Y|−1) · |Z_combos|. Converting CMI from bits to nats and computing the chi-squared survival function yields a proper p-value:Note
🔒 Integrity filtering filtered 111 items
Integrity filtering activated and filtered the following items during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.
issue_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)