Follow-up from PR #32 (which added check-phi). The gap was acknowledged in that PR but never tracked.
Problem
check-phi/check-phi.py carries several heuristic PHI detectors (MRN/SSN/phone/email/date-of-birth patterns, csv_phi_header column-name scan, etc.), but there is no automated test coverage for them. The check-phi/ directory contains only action.yml and check-phi.py; the repo's only test artifact is .github/workflows/_selftest.yml (an in-repo self-scan), so the detectors are exercised only by crafted live inputs, not a repeatable unit suite.
During PR #32 review the author noted "No unit-test harness exists in this repo to add a fixture to," and the reviewer flagged the csv_phi_header diff-scope edge case (it only fires on line 1 of a diff hunk) as "worth a fixture test if coverage is incomplete."
Why it matters
These detectors gate PRs for protected health information — false negatives are a compliance risk and false positives are friction. Regex/heuristic detectors are exactly the kind of code that silently regresses on refactor without tests.
Proposed fix
- Add a small Python test harness for
check-phi.py (e.g. pytest under check-phi/tests/), invoking the detector functions directly.
- Add positive/negative fixtures per detector, including the
csv_phi_header diff-scope edge case.
- Wire the suite into
_selftest.yml so it runs in CI.
Low-to-medium priority, but worth doing while the detector behavior is fresh.
Follow-up from PR #32 (which added
check-phi). The gap was acknowledged in that PR but never tracked.Problem
check-phi/check-phi.pycarries several heuristic PHI detectors (MRN/SSN/phone/email/date-of-birth patterns,csv_phi_headercolumn-name scan, etc.), but there is no automated test coverage for them. Thecheck-phi/directory contains onlyaction.ymlandcheck-phi.py; the repo's only test artifact is.github/workflows/_selftest.yml(an in-repo self-scan), so the detectors are exercised only by crafted live inputs, not a repeatable unit suite.During PR #32 review the author noted "No unit-test harness exists in this repo to add a fixture to," and the reviewer flagged the
csv_phi_headerdiff-scope edge case (it only fires on line 1 of a diff hunk) as "worth a fixture test if coverage is incomplete."Why it matters
These detectors gate PRs for protected health information — false negatives are a compliance risk and false positives are friction. Regex/heuristic detectors are exactly the kind of code that silently regresses on refactor without tests.
Proposed fix
check-phi.py(e.g.pytestundercheck-phi/tests/), invoking the detector functions directly.csv_phi_headerdiff-scope edge case._selftest.ymlso it runs in CI.Low-to-medium priority, but worth doing while the detector behavior is fresh.