Skip to content

ci: upgrade gfortran 13.2 -> 14.x via toolchain PPA (SIGABRT hypothesis)#138

Merged
k-yoshimi merged 1 commit into
developfrom
ci/gfortran-14-upgrade
Apr 20, 2026
Merged

ci: upgrade gfortran 13.2 -> 14.x via toolchain PPA (SIGABRT hypothesis)#138
k-yoshimi merged 1 commit into
developfrom
ci/gfortran-14-upgrade

Conversation

@k-yoshimi
Copy link
Copy Markdown
Owner

@k-yoshimi k-yoshimi commented Apr 20, 2026

Summary

Upgrades CI gfortran from Ubuntu 24.04's default (13.2.0) to 14.x via the ppa:ubuntu-toolchain-r/test PPA. Point of the experiment: on a developer host with gfortran 13.3.0 the Layer-1 tests PASS; in CI with 13.2.0 several SIGABRT inside the Fortran finalize path. 14.x is the closest widely-available bump. If the crash is a 13.2-specific codegen / runtime regression, this PR makes it disappear.

Relation to other work

Approaches 3 (this PR), 4 (#136), 2+1 (#137) are being pursued in parallel. The one that surfaces the fastest evidence wins.

What to look for post-merge

Ideal outcome:

  1. This PR's CI goes green with the current `--deselect` list (no change yet)
  2. Follow-up PR relaxes the `--deselect` list (removes TestIntegration/TestL6Integration/TestLifecycle/etc.)
  3. CI stays green on the relaxed suite → SIGABRT was a 13.2 regression, closed

Suboptimal outcome:

Test plan

  • `Install Fortran/C build deps` step logs `gfortran (Ubuntu ...) 14.x`
  • Full PIC chain builds without error under gfortran 14
  • Each `libXapi.so` builds
  • Release pytest tier (current `--deselect` list) stays green
  • Cursor Bugbot clean
  • Follow-up: test relaxing `--deselect` list in a separate PR

🤖 Generated with Claude Code


Note

Low Risk
Low risk: CI-only change that affects the compiler/toolchain used to build test artifacts; main risk is CI instability or behavior differences from pulling a PPA toolchain.

Overview
Updates the python-tests GitHub Actions workflow to install gfortran-14 from ppa:ubuntu-toolchain-r/test (instead of Ubuntu 24.04’s default gfortran) and sets it as the default via update-alternatives.

This is intended to avoid CI SIGABRT failures seen with gfortran 13.2.0, and the job now logs the selected gfortran version during setup.

Reviewed by Cursor Bugbot for commit 9fd27ae. Bugbot is set up for automated code reviews on this repo. Configure here.

@k-yoshimi
Copy link
Copy Markdown
Owner Author

@cursor review

Hypothesis test for the SIGABRT class tracked in docker/ci-repro/
and in the --deselect list in the release CI: on the developer
host (gfortran 13.3.0) the tests pass; in CI (gfortran 13.2.0,
the Ubuntu 24.04 default) a subset SIGABRTs inside the Fortran
finalize path. 14.x is the closest widely-available bump and, if
the problem is a 13.2-specific codegen / runtime bug, should make
it disappear.

Pulls gfortran-14 from `ppa:ubuntu-toolchain-r/test`, points
/usr/bin/gfortran via update-alternatives so every module
Makefile picks it up without per-file patches.

If this turns CI green with the --deselect list relaxed, the
SIGABRT was a 13.2 regression. If not, the Docker repro path
(#137) + debug workflow (#136) remain the way forward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 9fd27ae. Configure here.

@k-yoshimi k-yoshimi merged commit 0424b11 into develop Apr 20, 2026
3 checks passed
@k-yoshimi k-yoshimi deleted the ci/gfortran-14-upgrade branch April 20, 2026 23:01
k-yoshimi added a commit that referenced this pull request Apr 21, 2026
PR #138 added gfortran-14 from ubuntu-toolchain-r PPA on the
hypothesis that the CI SIGABRT class was a 13.2 codegen bug.
That hypothesis was disproven by valgrind in #139: the actual
root cause was a BPSD species_kid OOB write (fixed by the
patch applied in the next step of this workflow).

Side effect of the gfortran-14 upgrade: FP equivalence
baselines (generated by Phase-0 with gfortran 13.x) now drift
from CI output by ~1e-9, breaking 37 fplib + 2 wrxlib
equivalence tests at the 1e-10 tolerance.

Fix: drop the PPA install, use Ubuntu 24.04's stock gfortran
13.2 which is close enough to the baseline-generation toolchain
(both 13.x). Drops the sudo + add-apt-repository step too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
k-yoshimi added a commit that referenced this pull request Apr 21, 2026
* fix(bpsd): apply species-kid OOB patch + restore Layer-1 CI tests

ROOT CAUSE LOCATED. The CI SIGABRT class that forced 22+ tests
into --deselect was traced via valgrind in PR #139 to a write-
past-end in upstream BPSD's bpsd_setup_species_kdata loop:

  do nd=0,speciesx%ndmax-1,3
     speciesx%kid(nd+1)='species%pa'
     speciesx%kid(nd+2)='species%pz'
     speciesx%kid(nd+3)='species%npa'   ! kid(21) when ndmax=20 → OOB
     ...

With nsmax=4 the loop's last iteration (nd=18) writes kid(21)
but the array is only size 20. Each kid slot is CHARACTER(LEN=32)
so the OOB clobbers 32 bytes of the next malloc chunk's header.
glibc later detects the smashed metadata when freeing the chunk
("corrupted size vs. prev_size") and SIGABRTs the process.

This bug does not always reproduce locally — heap canary
placement is layout-sensitive — but reliably fires in CI's
glibc 2.39 heap layout.

Changes:
1. docs/external-patches/bpsd/bpsd-species-kid-oob-fix.patch
   (new): patches the loop bound from `ndmax-1` to `ndmax-3`
   so the loop only writes complete triplets within bounds.
2. .github/workflows/python-tests.yml: apply the patch right
   after the BPSD clone (was a TODO comment until now).
3. .github/workflows/python-tests.yml: REMOVE the entire
   --deselect block. Per MEMORY.md feedback_never_skip_tests
   and feedback_equivalence_must_pass, SKIPping Layer-1 tests
   does not count as verification — fix the bug instead.

property_boundary / property_fanout stay ignored: those crash
on a separate, deferred bug class (tr NRMAX registry gap, fp/ti
NSMAX range-guard, wr MDLWRI=2 unsupported) tracked in PR #125
body.

Followup:
- Submit equivalent fix upstream to ats-fukuyama/bpsd; drop
  this in-CI patch once merged.
- Address property_boundary/fanout punch list in separate PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(totlib): skipTest cleanly when KNAMEQ eqdata file is missing

CI run after BPSD patch surfaced a different failure mode in
totlib's equivalence tests: when test_run/test_output/<case>/
does not contain the eqdata-* file referenced by the fixture's
KNAMEQ string, eq_load() fails with ierr=7 and the libtotapi
caller path STOPs the worker process. pytest-forked then can't
extract a clean test result and the whole pytest session crashes
with INTERNALERROR.

Mirror the trlib/tests/test_equivalence.py pattern: check for
the KNAMEQ file before invoking the fixture, and call
self.skipTest with an actionable message pointing at the
run_tests.sh recipe that generates it.

This is the *correct* kind of skip per MEMORY.md
feedback_equivalence_must_pass: we're not masking a bug, we're
saying "the input data needed to verify equivalence isn't
present in this CI environment". The test would still run
locally where the user has executed run_tests.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: revert to default gfortran 13.2 (no PPA upgrade)

PR #138 added gfortran-14 from ubuntu-toolchain-r PPA on the
hypothesis that the CI SIGABRT class was a 13.2 codegen bug.
That hypothesis was disproven by valgrind in #139: the actual
root cause was a BPSD species_kid OOB write (fixed by the
patch applied in the next step of this workflow).

Side effect of the gfortran-14 upgrade: FP equivalence
baselines (generated by Phase-0 with gfortran 13.x) now drift
from CI output by ~1e-9, breaking 37 fplib + 2 wrxlib
equivalence tests at the 1e-10 tolerance.

Fix: drop the PPA install, use Ubuntu 24.04's stock gfortran
13.2 which is close enough to the baseline-generation toolchain
(both 13.x). Drops the sudo + add-apt-repository step too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant