Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
966a5b9
Changed VERSION to 2.9.0
ptrendx Oct 16, 2025
739c656
[JAX] Fix imports in test for deprecated jax.experimental.pjit (#2274)
KshitijLakhani Oct 17, 2025
c2a643d
Wheels for cuda 13 (#2278)
ksivaman Oct 18, 2025
7e72d41
[JAX] NVFP4 recipe with option to enable/disable SR, RHT, and 2D quan…
jberchtold-nvidia Oct 22, 2025
9b75db3
Include TE core headers in final build (#2291)
ksivaman Oct 23, 2025
8b9849a
Overhaul the compilation for the arch-specific features (#2279)
ptrendx Oct 23, 2025
c4c185d
[PyTorch] Add max_logit support for MuonClip (#2195)
cyanguwa Oct 25, 2025
fa71964
[PyTorch] Fix CI failures due to deterministic attention backend (#2288)
ksivaman Oct 20, 2025
fe9b150
[JAX] Fix: Skip determinism tests for bprop for all sm >=100 (#2315)
KshitijLakhani Oct 30, 2025
0acd0e7
[PyTorch] Fix attention backend and tests for `sm120` (#2320)
ksivaman Oct 30, 2025
9cc089a
[PyT] Bump the min version expected to supported FP8 current scaling …
KshitijLakhani Oct 30, 2025
70f5366
[JAX] Ensure JAX reference impl uses an accurate backend in our tests…
jberchtold-nvidia Oct 30, 2025
bae9d3a
[Version] Reset to TransformerEngine v2.9 (#5)
lxd-cumt Dec 11, 2025
e13e38a
Fix import bugs (#6)
lxd-cumt Dec 11, 2025
ef41367
Fix flash-attention fallback failures (#7)
lxd-cumt Dec 17, 2025
fd5f657
Multi-Backend Architecture Implementation for TransformerEngine-FL (#4)
lihongyang1990 Dec 29, 2025
57adff4
Add missing __init__.py files and policy test suite (#9)
lihongyang1990 Jan 4, 2026
ec8edfc
Polish readme (#11)
lxd-cumt Jan 4, 2026
b26b226
Register get_attention_backend for all backends and fix FlashAttentio…
lihongyang1990 Jan 6, 2026
a423680
fix nv shared lib bug. (#16)
lihongyang1990 Jan 7, 2026
fbe34bd
Add a new vendor implementation named hygon (#15)
Jan 12, 2026
396794e
Update the way the gems context is invoked in the FlagOS Backend (#18)
lxd-cumt Jan 12, 2026
3d80e63
Unify the usage of the gems context (#20)
lxd-cumt Jan 12, 2026
f101d2c
fix: torch SDPA backend multi-batch support (#17)
lihongyang1990 Jan 12, 2026
832a797
Remove use_gems context and call flag_gems.xxx directly (#22)
lxd-cumt Jan 13, 2026
08cabba
Add new vendor backend METAX (#21)
dinghaodhd Jan 16, 2026
03d1998
Add multi_tensor_adam_param_remainder and context parallel support (#23)
lihongyang1990 Jan 21, 2026
54390c7
Fix enum mismatch in plugins (#25)
lxd-cumt Jan 22, 2026
48c8480
add Vendor KUNLUNXIN (#27)
ssuurrffaaccee Jan 25, 2026
de00a8a
Fix the incorrect registration on Kunlunxin (#29)
lxd-cumt Jan 26, 2026
35e1809
Polish available check for kunlunxin (#30)
lxd-cumt Jan 26, 2026
8690ab4
Add new register op get_attention_backend for METAX (#31)
dinghaodhd Jan 28, 2026
b0a5934
[iluvatar]add vendor/iluvatar backend (#35)
DannyP0 Feb 5, 2026
12b2077
Fix: Resolve parameter mismatch between TE_FL and NVTE functions (#34)
lihongyang1990 Feb 10, 2026
f808816
[CICD] Add workflows to validate TE QA test cases (#41)
Darryl233 Mar 2, 2026
47e8ee7
Refactor optimizer implementations and improve multi_tensor ops (#36)
lihongyang1990 Mar 3, 2026
acced6d
tefl musa support (#42)
jiamingwang-mt Mar 11, 2026
4f54860
Add python-level patches to supporting multiple platforms (#49)
lxd-cumt Mar 23, 2026
7f788a3
Add scaled_masked_softmax_forward/backward for flagos backend (#52)
lxd-cumt Mar 25, 2026
1f98511
Fix quantizer dtype conversion errors (#54)
lxd-cumt Mar 26, 2026
2188137
apply flagos te_groups_gemm op (#55)
chai-xiaonan Mar 30, 2026
ebcfadc
[CICD] support Metax MACA workflow (#48)
qqjxzxq Apr 2, 2026
9d1c48a
[CICD] Upload unittest coverage report to FlagCICD platform && Access…
BrianPei Apr 9, 2026
d5ada9d
merge(dev): integrate upstream release_v2.14
Apr 10, 2026
46b77e4
plugin: sync plugin APIs with upstream csrc changes
Apr 13, 2026
3230b42
patch: normalize new upstream 'cuda' string hardcoding to te_device_t…
Apr 13, 2026
0ebf525
fix: update stale references in fork code to match upstream renames
Apr 13, 2026
3c86a95
plugin: sync plugin APIs with upstream csrc changes
Apr 13, 2026
cc03ca3
fix(stage9): add bottom_right_diagonal and cuda_graph params to fused…
Apr 14, 2026
4db46ce
fix(stage9): replace stale CPUOffloadEnabled with is_cpu_offload_enab…
Apr 14, 2026
8fa8199
Final Polish
Apr 15, 2026
d7e9e7b
[CICD] Refactor workflows, Add integration_tests, Switch to FlagCICD …
BrianPei Apr 24, 2026
ae664ea
[CICD] Refactor workflows, Add integration_tests, Switch to FlagCICD …
BrianPei Apr 24, 2026
24c28d0
merge: integrate upstream release_v2.14 via tree replacement
May 9, 2026
36af46a
chore: remove SYNC_POINT.md (intermediate sync record, not needed on …
May 9, 2026
e2812ae
fix commit init
May 9, 2026
7b33144
Fix pylint errors: remove unused imports and correct import order
May 9, 2026
2c334ae
Fix fused_rope_backward: add missing start_positions parameter to plu…
May 11, 2026
879eddc
fix test_numerics unit test
May 12, 2026
e12589a
fix Latex not found errors, use mathjax
May 12, 2026
e5c8380
Fix Sphinx build warnings: suppress autoapi import resolution and unk…
May 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .github/configs/ascend.yml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration file is not derived from the template that used by cuda.yml and meta.yml

Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Huawei Ascend NPU configuration
image: ascend-infer:ubuntu18.04
labels:
- npu
- ascend
docker_options: |
--device /dev/davinci0
--device /dev/davinci1
--device /dev/davinci2
--device /dev/davinci3
--device /dev/davinci_manager
--device /dev/devmm_svm
--device /dev/hisi_hdc
--volume /usr/local/Ascend/driver:/usr/local/Ascend/driver
--volume /usr/local/Ascend/add-ons:/usr/local/Ascend/add-ons
70 changes: 70 additions & 0 deletions .github/configs/cuda.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# CUDA Hardware Configuration for TransformerEngine-FL
# Refactored for A100 Nodes
# This file defines environment variables, volumes, and test filters for TE tests.

hardware_name: cuda
display_name: 'NVIDIA CUDA (A100)'

# CI image for online env
ci_image: harbor.baai.ac.cn/flagscale/cuda12.8.1-torch2.7.1-python3.10-te2.9:20260209

# Runner labels for self-hosted A100 node
# runner_labels:
# - self-hosted
# - Linux
# - X64
# - nvidia
# - gpu-8

# Runner labels for online env
runner_labels:
- nv-8g-cicd-te

# Container volumes
container_volumes:
- /home/flagscale_cicd/flask/static:/workspace/report

# Container options
container_options: >-
--privileged
--gpus all
--shm-size=500g
--ipc=host
--ulimit memlock=-1
--ulimit stack=67108864
--user root

# Platform-specific environment setup script
setup_script: .github/scripts/setup_cuda.sh

# Build environment variables (platform-specific)
build_env:
TE_FL_SKIP_CUDA: '0'
SKIP_CUDA_BUILD: '0'
NVTE_WITH_CUDA: '1'
NVTE_WITH_MACA: '0'
TE_WITH_NCCL: '1'
NVTE_FRAMEWORK: pytorch
CUDA_HOME: /usr/local/cuda-12.8
NVCC: /usr/local/cuda-12.8/bin/nvcc

# Device types to run tests on
device_types:
- a100

# Test matrix configuration
test_matrix:
l0_pytorch:
path: 'qa/L0_pytorch_unittest/test.sh'
ignored_tests:
- test_sanity_layernorm_mlp
- test_sanity_gpt
- test_sanity_bert
- test_sanity_T5
- test_sanity_amp_and_nvfuser
- test_sanity_drop_path
- test_layernorm_mlp_accuracy
- test_grouped_linear_accuracy
- test_gpt_accuracy
- test_basic_linear
- test_layer_norm
68 changes: 68 additions & 0 deletions .github/configs/metax.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Metax Hardware Configuration for TE-FL
# This file defines CI/CD settings for Metax-based testing
# This file defines environment variables, volumes, and test filters for TE tests.

hardware_name: metax
display_name: 'Metax Tests'

# CI image for Metax dev env
# ci_image: localhost:5000/megatron-lm-with-te:v1

# CI image for online env
ci_image: harbor.baai.ac.cn/flagscale/megatron-lm-with-te:202603231839

# Runner labels for self-hosted Metax node
# runner_labels:
# - self-hosted
# - Linux
# - X64
# - metax
# - dev

# Runner labels for online env
runner_labels:
- mx-4g-cicd-te

# Container volumes
container_volumes:
- /nfs/metax_fs:/nfs/metax_fs

# Container options
container_options: >-
--uts=host
--ipc=host
--privileged=true
--group-add video
--shm-size=100gb
--ulimit memlock=-1
--user root
--ulimit nofile=65535:65535
-e PLATFORM=metax
-e TORCH_DISTRIBUTED_BACKEND=mccl
-e LD_LIBRARY_PATH=/opt/maca/lib:/usr/local/lib:$LD_LIBRARY_PATH

# Platform-specific environment setup script
setup_script: .github/scripts/setup_metax.sh

# Build environment variables (platform-specific)
build_env:
TE_FL_SKIP_CUDA: '1'
NVTE_WITH_MACA: '1'
CUDA_HOME: /opt/maca
MACA_HOME: /opt/maca

# Device types to run tests on
device_types:
- c500

# Test matrix configuration
test_matrix:
unit:
devices:
- c500
# Ignored test files for unit tests
# These files will be skipped when running pytest
ignored_tests:
# example: tests/unit_tests/test_example.py
# - tests/unit_tests/test_inference.py
# - tests/unit_tests/test_rl_utils.py
16 changes: 16 additions & 0 deletions .github/configs/template.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Configuration Template
# This file describes the structure for hardware-specific configurations.
#
# Fields:
# - image: Docker image to use for the runner
# - labels: List of labels for the runner
# - docker_options: Additional Docker options for mounting devices, volumes, etc.
#
# Example:
# image: <docker_image>
# labels:
# - <label1>
# - <label2>
# docker_options: |
# --option1 value1
# --option2 value2
25 changes: 25 additions & 0 deletions .github/scripts/setup_cuda.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/usr/bin/env bash
# CUDA Platform Environment Setup Script
# Called by unit_tests_common.yml for CUDA platforms (A100, H100, etc.)
set -euo pipefail

echo "===== Step 0: Activate Python environment ====="
source /opt/miniconda3/etc/profile.d/conda.sh
conda activate flagscale-train
echo "PATH=$PATH" >> $GITHUB_ENV
echo "Python: $(which python3) ($(python3 --version 2>&1))"

echo "===== Step 1: Remove Existing TransformerEngine ====="
pip uninstall transformer_engine transformer_engine_torch -y || true

echo "===== Step 2: Build & Install TransformerEngine ====="
cd $GITHUB_WORKSPACE

pip install nvdlfw-inspect --quiet
pip install expecttest --quiet
pip install . -v --no-deps --no-build-isolation

echo "===== Step 3: Verify Installation ====="
python3 tests/pytorch/test_sanity_import.py

echo "===== Environment Setup Complete ====="
50 changes: 50 additions & 0 deletions .github/scripts/setup_metax.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/usr/bin/env bash
# Metax Platform Environment Setup Script
# Called by unit_tests_common.yml for Metax platforms (C500, etc.)
set -euo pipefail

echo "===== Step 0: Activate Python environment ====="
source /opt/conda/etc/profile.d/conda.sh
conda activate base
echo "PATH=$PATH" >> $GITHUB_ENV
echo "Python: $(which python3) ($(python3 --version 2>&1))"

echo "===== Step 1: Base Environment Setup ====="
# Configure MACA toolchain paths
export PATH=/opt/maca/bin:$PATH
export LD_LIBRARY_PATH=/opt/maca/lib:$LD_LIBRARY_PATH
service ssh restart

echo "===== Step 2: Create nvcc Symlink (cucc -> nvcc) ====="
# TransformerEngine expects nvcc, but MACA provides cucc
ln -sf /opt/maca/tools/cu-bridge/bin/cucc /opt/maca/tools/cu-bridge/bin/nvcc
which nvcc || true

echo "===== Step 3: Install Required System Tools ====="
# Use apt to install git, curl
sed -i 's|http://mirrors.aliyun.com/ubuntu|http://archive.ubuntu.com/ubuntu|g' /etc/apt/sources.list
apt-get update -qq || true
apt-get install -y -qq git curl
# Install cmake and ninja via pip (more reliable than apt in this env)
python3 -m pip install cmake ninja torch --no-cache-dir

echo "===== Step 4: Remove Existing TransformerEngine ====="
# Prevent conflicts with preinstalled or incompatible versions
python3 -m pip uninstall transformer_engine -y || true
python3 -m pip install nvdlfw-inspect --no-deps || true

echo "===== Step 5: Install TE-FL Plugin Layer ====="
# Install TransformerEngine-FL Python layer (plugin logic)
cd $GITHUB_WORKSPACE
TE_FL_SKIP_CUDA=1 python3 setup.py install

echo "===== Step 6: Final Verification ====="
# Verify both TE Python API and backend are functional
python3 - <<'EOF'
import transformer_engine
import transformer_engine_torch as te
print("transformer_engine:", transformer_engine)
print("transformer_engine_torch:", te)
EOF

echo "===== Environment Setup Complete ====="
32 changes: 32 additions & 0 deletions .github/workflows/all_tests_ascend.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: ascend_tests

on:
# push:
# branches: ["main"]
# pull_request:
# branches: ["main"]
workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-${{ github.actor }}
cancel-in-progress: true

jobs:
run_tests:
# Package manager and environment settings are read from .github/configs/ascend.yml
uses: ./.github/workflows/all_tests_common.yml
with:
platform: ascend

all_tests:
needs: run_tests
runs-on: ubuntu-latest
if: always()
steps:
- name: Verify workflow status
run: |
if [ "${{ needs.run_tests.result }}" != "success" ]; then
echo "❌ Tests workflow failed"
exit 1
fi
echo "✅ All tests passed!"
Loading
Loading