Skip to content

MetricComparisons into InformationImbalance + NeighborhoodOverlap; move k*+ID functions into KStar class#181

Merged
imacocco merged 4 commits into
mainfrom
refactor_metric_comparisons
May 26, 2026
Merged

MetricComparisons into InformationImbalance + NeighborhoodOverlap; move k*+ID functions into KStar class#181
imacocco merged 4 commits into
mainfrom
refactor_metric_comparisons

Conversation

@diegodoimo

@diegodoimo diegodoimo commented May 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

Refactor MetricComparisons class into InformationImbalance and NeighborhoodOverlap, plus relocation of two k* ID-estimation methods to KStar class. Fully backward compatible.

Why

The metric comparisons class is more than 1000 lines long. It contains two conceptually different sets of functions: those used to compute information imbalance and neighborhood overlap, including their helper methods.

What changed

New classes

  • InformationImbalance (in dadapy/information_imbalance.py) collects all the 17 imbalance + causality methods (return_inf_imb_*, greedy_feature_selection_*, the causality block).
  • NeighborhoodOverlap (in dadapy/neighborhood_overlap.py) containes the neighborhood overlap functions: return_label_overlap, return_data_overlap, and _label_imbalance_helper.
  • Both inherit directly from Base. Shared helper _get_nn_indices lifted to dadapy/_utils/metric_comparisons.py as a module-level function.
  • Data now inherits from InformationImbalance, NeighborhoodOverlap directly (no through MetricComparisons).

Backward-compatibility

  • MetricComparisons shrunk from ~1050 lines to a ~35-line, left for backward compatibility:

    class MetricComparisons(InformationImbalance, NeighborhoodOverlap):
        ...

    Existing code (from dadapy import MetricComparisons, instantiation, method calls) works unchanged.

Symmetric constructor API for the comparison classes

  • InformationImbalance(X1, X2).return_information_imbalance()
  • NeighborhoodOverlap(X1, X2).return_data_overlap(k=30)
  • NeighborhoodOverlap(X, labels=y).return_label_overlap(k=5)
  • The existing asymmetric calls via the Data class are unchanged.

Relocation of two k-star ID-estimation methods to KStar class

  • return_ids_kstar_gride and return_ids_kstar_binomial moved from Data to KStar, where they conceptually belong; I personally don't understand why those two functions were in Data.
  • Data shrank from 243 → 76 lines and is now a pure container.

@imacocco imacocco merged commit 3e8b255 into main May 26, 2026
7 checks passed
@imacocco imacocco deleted the refactor_metric_comparisons branch May 26, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants