Skip to content

fix(equiv): Linux-canonical skip on non-Linux (#213)#216

Merged
k-yoshimi merged 5 commits into
developfrom
equiv-linux-canonical
May 26, 2026
Merged

fix(equiv): Linux-canonical skip on non-Linux (#213)#216
k-yoshimi merged 5 commits into
developfrom
equiv-linux-canonical

Conversation

@k-yoshimi
Copy link
Copy Markdown
Owner

@k-yoshimi k-yoshimi commented May 26, 2026

Summary

Closes #213. Ends the macOS-only 1e-10 equivalence false-positive class without skipping for invisibility reasons. Equivalence tests are declared Linux-canonical: Ubuntu CI runner with gfortran 13.x is the authoritative environment; non-Linux platforms skip with an explicit, policy-documented reason.

What's in this PR (4 commits)

  1. test(equiv): Linux-canonical skipUnless on all 7 modulesIS_LINUX = sys.platform.startswith("linux") + @unittest.skipUnless(IS_LINUX, ...) outermost decorator on TestEquivalence in eqlib/trlib/tilib/fplib/wrlib/wrxlib/totlib. Skip message references docs/baseline-policy.md.
  2. docs: baseline-policy.md — Linux-canonical equivalence — new top-level policy doc (~140 lines) covering: what the 1e-10 contract asserts/does-not-assert, canonical platform, non-Linux behavior, local-verification via Linux container, prerequisites to promote a new platform to canonical.
  3. docs(baseline-policy): link follow-up issue #215 — replaces gh-search placeholder with Python fixture parity for the 6-8 dead-baseline equivalence cases #215 link.
  4. docs(baseline-policy): align predicate prose with code — Codex pre-push LOW fix: prose now says sys.platform.startswith("linux") matching the actual code.

Why Linux-canonical, not platform-keyed

The platform-keyed approach (committed as design history in docs/superpowers/specs/2026-05-26-platform-keyed-baselines-design.md, marked SUPERSEDED) was rejected after Codex execution-blocker review surfaced two structural prerequisites:

  1. Graphics-stubs gap: 6 of 7 modules can't build standalone Fortran binaries on macOS (only tot has tot_static_stubs.f90). Memory reference_clavius_baseline_regen.md already documents this.
  2. Python-fixture gap: 6-8 of 20 baselines lack <case>_params.py. Filed as Python fixture parity for the 6-8 dead-baseline equivalence cases #215.

Both gaps would need to close before per-platform baseline regen is feasible. Until then: Linux-canonical is the principled position.

Verification

macOS dev rig (Homebrew GCC 15.2.0):

$ pytest --forked --timeout=120 --timeout-method=signal \
    python/{eqlib,trlib,tilib,fplib,wrlib,wrxlib,totlib}/tests/test_equivalence.py
15 skipped in 0.72s

The 4 #213 failures (fp_dt1 RPCT, fp_iter01 RPCT, wrx_demo, wrx_iter01.pwr_tot) are gone — replaced with documented skips per the policy.

Linux CI (python-tests.yml:323 whole-tree pytest) continues to exercise all 7 equivalence suites on Ubuntu gfortran 13.x — IS_LINUX is True there, behavior unchanged.

Memory + index

feedback_equivalence_must_pass.md (controller-side) extended with a "Two skip classes" section distinguishing invisibility skip (FORBIDDEN, the existing rule) from principled platform-scoped skip (ALLOWED when canonical-env CI is documented). This PR is the reference example. Cargo-cult guardrail included.

Acceptance criteria (spec §8)

  • All 7 test_equivalence.py have @skipUnless(IS_LINUX, ...).
  • macOS pytest shows 15 skipped + 0 failures from these suites.
  • docs/baseline-policy.md exists.
  • Memory updated.
  • Old platform-keyed spec + plan have SUPERSEDED banners (landed in F-2).
  • Follow-up Python fixture parity for the 6-8 dead-baseline equivalence cases #215 filed.
  • Both pre-push reviews HIGH/MED-free.
  • Linux CI green (verifiable post-push).

Issue references

Closes #213.
Follow-up: #215.

Spec

docs/superpowers/specs/2026-05-26-linux-canonical-equiv-policy-design.md (Codex 3-round SHIP IT).

🤖 Generated with Claude Code


Note

Low Risk
Test gating and documentation only; Linux CI still runs full equivalence; no production or auth paths change.

Overview
Linux-canonical equivalence replaces macOS 1e-10 baseline failures with an explicit, documented skip on non-Linux hosts.

All seven python/*/tests/test_equivalence.py modules now gate TestEquivalence with IS_LINUX = sys.platform.startswith("linux") and an outermost @unittest.skipUnless(IS_LINUX, …) that points reviewers to docs/baseline-policy.md. On Ubuntu CI (gfortran 13.x), behavior is unchanged; on macOS and other non-Linux dev machines, those suites skip instead of failing on libm/compiler drift.

The new docs/baseline-policy.md states what the 1e-10 contract does and does not cover, names Linux CI as canonical, explains WSL/Docker vs host OS, and documents how to verify locally and what must happen before promoting another platform (graphics stubs, Python fixtures, #215).

Reviewed by Cursor Bugbot for commit adb5af7. Bugbot is set up for automated code reviews on this repo. Configure here.

k-yoshimi and others added 5 commits May 26, 2026 21:59
Equivalence tests are now declared Linux-canonical per
docs/superpowers/specs/2026-05-26-linux-canonical-equiv-policy-design.md.
Each of the 7 test_equivalence.py files (eqlib/trlib/tilib/fplib/
wrlib/wrxlib/totlib) gains:

  IS_LINUX = sys.platform.startswith("linux")

  @unittest.skipUnless(
      IS_LINUX,
      "Equivalence tests are Linux-canonical. ... See "
      "docs/baseline-policy.md.",
  )
  class TestEquivalence(...):

placed outside the existing decorator chain (the previous lib-so-
exists / importable / RUN_OK decorators stay as-is, after the
Linux check).

Verification on the dev rig (macOS Homebrew GCC 15.2.0):

  $ pytest -p no:cacheprovider --forked --timeout=120 \
           --timeout-method=signal \
           python/{eqlib,trlib,tilib,fplib,wrlib,wrxlib,totlib}/tests/test_equivalence.py
  15 skipped in 0.72s

The 4 #213 failures (fp_dt1 RPCT, fp_iter01 RPCT, wrx_demo,
wrx_iter01 pwr_tot) are gone — replaced with documented skips
per the policy.

Linux CI (python-tests.yml:323 whole-tree pytest) continues to run
all 7 suites on Ubuntu gfortran 13.x; the IS_LINUX predicate is
True there so behavior is unchanged.

docs/baseline-policy.md and the memory update follow in the next
two commits.

Spec: docs/superpowers/specs/2026-05-26-linux-canonical-equiv-policy-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New top-level docs/baseline-policy.md describes:
  - What the 1e-10 contract asserts (within-platform reproducibility)
    and what it doesn't (cross-platform bit-equivalence).
  - The canonical platform: Ubuntu CI runner with gfortran 13.x.
  - Non-Linux behavior (skip; WSL + Linux Docker count as Linux per
    sys.platform).
  - How macOS dev verifies locally (Linux container).
  - How a future contributor promotes a new platform to canonical,
    explicitly enumerating the two structural prerequisites
    (graphics-stubs gap + Python-fixture-parity gap) that would
    need to close first.
  - Cross-references to memory + the SUPERSEDED platform-keyed
    design history.

The 7 test_equivalence.py skipUnless messages (prior commit) point
here.

Spec: docs/superpowers/specs/2026-05-26-linux-canonical-equiv-policy-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the gh-search placeholder for the Python-fixture-parity
gap with a direct link to issue #215 (filed 2026-05-26).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex pre-push review LOW: prose said `sys.platform == 'linux'`
but the actual code uses `sys.platform.startswith("linux")`.
Tightened to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex 2026-05-26 PR-216 review MED: line 45 referenced
\`memory/reference_clavius_baseline_regen.md\` as if it were a
repo-relative path. Memory files actually live outside the repo at
\$CLAUDE_MEMORY_DIR; the convention in this repo's other docs is
to use the slug alone (e.g. lines 111/128/131 of this same file).
Aligned line 45 to that convention to avoid Bugbot dead-link
false positives.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@k-yoshimi
Copy link
Copy Markdown
Owner Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit adb5af7. Configure here.

@k-yoshimi k-yoshimi merged commit 957ee73 into develop May 26, 2026
4 checks passed
@k-yoshimi k-yoshimi deleted the equiv-linux-canonical branch May 27, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant