Skip to content

check-phi: add an automated test harness + fixtures for the PHI detectors #60

@d-morrison

Description

@d-morrison

Follow-up from PR #32 (which added check-phi). The gap was acknowledged in that PR but never tracked.

Problem

check-phi/check-phi.py carries several heuristic PHI detectors (MRN/SSN/phone/email/date-of-birth patterns, csv_phi_header column-name scan, etc.), but there is no automated test coverage for them. The check-phi/ directory contains only action.yml and check-phi.py; the repo's only test artifact is .github/workflows/_selftest.yml (an in-repo self-scan), so the detectors are exercised only by crafted live inputs, not a repeatable unit suite.

During PR #32 review the author noted "No unit-test harness exists in this repo to add a fixture to," and the reviewer flagged the csv_phi_header diff-scope edge case (it only fires on line 1 of a diff hunk) as "worth a fixture test if coverage is incomplete."

Why it matters

These detectors gate PRs for protected health information — false negatives are a compliance risk and false positives are friction. Regex/heuristic detectors are exactly the kind of code that silently regresses on refactor without tests.

Proposed fix

  • Add a small Python test harness for check-phi.py (e.g. pytest under check-phi/tests/), invoking the detector functions directly.
  • Add positive/negative fixtures per detector, including the csv_phi_header diff-scope edge case.
  • Wire the suite into _selftest.yml so it runs in CI.

Low-to-medium priority, but worth doing while the detector behavior is fresh.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions