Skip to content

compat: wire the 42-module parity matrix into CI with per-module trend #812

@proggeramlug

Description

@proggeramlug

Part of #793 (axis 3 — Node.js platform built-in modules with real semantics).

What already exists (committed, not yet CI-wired)

A behavioral parity suite was built and committed over the last week:

  • test-files/test_parity_<module>.ts — 42 files, one per supported node:* module. Every API in the parity reference becomes a console.log("label:", expr) line under per-section try/catch. The byte-for-byte diff against node --experimental-strip-types IS the per-module gap report. Commits: d0e84724, 08c9004a, 1f2f44e5.
  • docs/runtime-parity.md — leaf-level inventory of the entire Node + Bun stdlib API surface (~50 modules, ~4,200 rows: | API | Node.js | Bun | Notes |), sourced from the authoritative Node docs + Bun compat table.
  • docs/runtime-parity-gaps.md — that inventory diffed against Perry's compile-time coverage (manifest + Expr::* HIR variants + js_* FFI exports across perry-runtime / perry-stdlib / 35 perry-ext-* crates). 2,417 true gaps.
  • scripts/gen_parity_tests.py + scripts/parity-skiplist.toml — regenerate skeletons from the inventory; skiplist documents architecturally-out-of-scope modules (vm/inspector/v8/etc.) with rationale.
  • run_parity_tests.sh --filter parity_ — runs just the parity matrix; auto-PERRY_ALLOW_UNIMPLEMENTED=1 so unimplemented APIs surface as runtime divergence instead of hard compile errors; strips DEP/Experimental Node warnings from the diff.

This is ~70% of the axis-3 instrumentation #793 is asking for. It is currently run ad-hoc locally and has repeatedly been cleaned out of the working tree between sessions (now committed, so durable).

What this sub-issue adds

Wire it into CI as a tracked, per-module signal — distinct from #794 (per-category thresholds) and #800 (Node's own test corpus):

  1. CI job runs ./run_parity_tests.sh --filter parity_ on every push to main.
  2. Per-module trend artifact — emit a table of module | node_lines | perry_lines | diff_lines | status so a regression in any single module is visible (not buried in a global %). The diff-line count per module is already computed locally; just persist it as a CI artifact + job annotation.
  3. Baseline lock — store the current per-module diff counts as test-parity/parity_matrix_baseline.json; CI fails if any module's diff grows (regression) and prints a celebratory note when one reaches 0 (full byte-for-byte PASS).
  4. Hard per-test output cap — see the robustness note on CI: stop truncating gap-suite output, add per-test summary #796: one test (test_parity_timers_promises, root cause await on namespace-member that returns undefined re-enters catch arm infinitely (5.7M lines observed) #712) emitted 5.7M identical lines and the runner's O(n²) bash normalize_output burned ~3 hours. The CI job MUST cap per-test stdout (e.g. 50k lines) and the normalizer must be linear, or pathological output DOSes the run.

Current baseline (v0.5.910 sweep, for reference)

Trend across sweeps v0.5.713 → v0.5.910: 0 → 5 PASS as #623/#629/#630/#631/#648/#649/#650/#681/#682/#712/#741/#742 landed.

Done when

  • A CI job produces the per-module diff table as an annotation on every main push.
  • parity_matrix_baseline.json is committed and regressions fail CI.
  • Per-test output is hard-capped and normalize_output is linear-time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions