forked from NVIDIA/TransformerEngine
-
Notifications
You must be signed in to change notification settings - Fork 18
TE-FL Upgrade: Synchronization with TE Release V2.14 #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lxd-cumt
wants to merge
61
commits into
flagos-ai:dev+te2.14.0
Choose a base branch
from
lxd-cumt:merge/dev-to-main-20260410
base: dev+te2.14.0
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
61 commits
Select commit
Hold shift + click to select a range
966a5b9
Changed VERSION to 2.9.0
ptrendx 739c656
[JAX] Fix imports in test for deprecated jax.experimental.pjit (#2274)
KshitijLakhani c2a643d
Wheels for cuda 13 (#2278)
ksivaman 7e72d41
[JAX] NVFP4 recipe with option to enable/disable SR, RHT, and 2D quan…
jberchtold-nvidia 9b75db3
Include TE core headers in final build (#2291)
ksivaman 8b9849a
Overhaul the compilation for the arch-specific features (#2279)
ptrendx c4c185d
[PyTorch] Add max_logit support for MuonClip (#2195)
cyanguwa fa71964
[PyTorch] Fix CI failures due to deterministic attention backend (#2288)
ksivaman fe9b150
[JAX] Fix: Skip determinism tests for bprop for all sm >=100 (#2315)
KshitijLakhani 0acd0e7
[PyTorch] Fix attention backend and tests for `sm120` (#2320)
ksivaman 9cc089a
[PyT] Bump the min version expected to supported FP8 current scaling …
KshitijLakhani 70f5366
[JAX] Ensure JAX reference impl uses an accurate backend in our tests…
jberchtold-nvidia bae9d3a
[Version] Reset to TransformerEngine v2.9 (#5)
lxd-cumt e13e38a
Fix import bugs (#6)
lxd-cumt ef41367
Fix flash-attention fallback failures (#7)
lxd-cumt fd5f657
Multi-Backend Architecture Implementation for TransformerEngine-FL (#4)
lihongyang1990 57adff4
Add missing __init__.py files and policy test suite (#9)
lihongyang1990 ec8edfc
Polish readme (#11)
lxd-cumt b26b226
Register get_attention_backend for all backends and fix FlashAttentio…
lihongyang1990 a423680
fix nv shared lib bug. (#16)
lihongyang1990 fbe34bd
Add a new vendor implementation named hygon (#15)
396794e
Update the way the gems context is invoked in the FlagOS Backend (#18)
lxd-cumt 3d80e63
Unify the usage of the gems context (#20)
lxd-cumt f101d2c
fix: torch SDPA backend multi-batch support (#17)
lihongyang1990 832a797
Remove use_gems context and call flag_gems.xxx directly (#22)
lxd-cumt 08cabba
Add new vendor backend METAX (#21)
dinghaodhd 03d1998
Add multi_tensor_adam_param_remainder and context parallel support (#23)
lihongyang1990 54390c7
Fix enum mismatch in plugins (#25)
lxd-cumt 48c8480
add Vendor KUNLUNXIN (#27)
ssuurrffaaccee de00a8a
Fix the incorrect registration on Kunlunxin (#29)
lxd-cumt 35e1809
Polish available check for kunlunxin (#30)
lxd-cumt 8690ab4
Add new register op get_attention_backend for METAX (#31)
dinghaodhd b0a5934
[iluvatar]add vendor/iluvatar backend (#35)
DannyP0 12b2077
Fix: Resolve parameter mismatch between TE_FL and NVTE functions (#34)
lihongyang1990 f808816
[CICD] Add workflows to validate TE QA test cases (#41)
Darryl233 47e8ee7
Refactor optimizer implementations and improve multi_tensor ops (#36)
lihongyang1990 acced6d
tefl musa support (#42)
jiamingwang-mt 4f54860
Add python-level patches to supporting multiple platforms (#49)
lxd-cumt 7f788a3
Add scaled_masked_softmax_forward/backward for flagos backend (#52)
lxd-cumt 1f98511
Fix quantizer dtype conversion errors (#54)
lxd-cumt 2188137
apply flagos te_groups_gemm op (#55)
chai-xiaonan ebcfadc
[CICD] support Metax MACA workflow (#48)
qqjxzxq 9d1c48a
[CICD] Upload unittest coverage report to FlagCICD platform && Access…
BrianPei d5ada9d
merge(dev): integrate upstream release_v2.14
46b77e4
plugin: sync plugin APIs with upstream csrc changes
3230b42
patch: normalize new upstream 'cuda' string hardcoding to te_device_t…
0ebf525
fix: update stale references in fork code to match upstream renames
3c86a95
plugin: sync plugin APIs with upstream csrc changes
cc03ca3
fix(stage9): add bottom_right_diagonal and cuda_graph params to fused…
4db46ce
fix(stage9): replace stale CPUOffloadEnabled with is_cpu_offload_enab…
8fa8199
Final Polish
d7e9e7b
[CICD] Refactor workflows, Add integration_tests, Switch to FlagCICD …
BrianPei ae664ea
[CICD] Refactor workflows, Add integration_tests, Switch to FlagCICD …
BrianPei 24c28d0
merge: integrate upstream release_v2.14 via tree replacement
36af46a
chore: remove SYNC_POINT.md (intermediate sync record, not needed on …
e2812ae
fix commit init
7b33144
Fix pylint errors: remove unused imports and correct import order
2c334ae
Fix fused_rope_backward: add missing start_positions parameter to plu…
879eddc
fix test_numerics unit test
e12589a
fix Latex not found errors, use mathjax
e5c8380
Fix Sphinx build warnings: suppress autoapi import resolution and unk…
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| # Huawei Ascend NPU configuration | ||
| image: ascend-infer:ubuntu18.04 | ||
| labels: | ||
| - npu | ||
| - ascend | ||
| docker_options: | | ||
| --device /dev/davinci0 | ||
| --device /dev/davinci1 | ||
| --device /dev/davinci2 | ||
| --device /dev/davinci3 | ||
| --device /dev/davinci_manager | ||
| --device /dev/devmm_svm | ||
| --device /dev/hisi_hdc | ||
| --volume /usr/local/Ascend/driver:/usr/local/Ascend/driver | ||
| --volume /usr/local/Ascend/add-ons:/usr/local/Ascend/add-ons |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| # CUDA Hardware Configuration for TransformerEngine-FL | ||
| # Refactored for A100 Nodes | ||
| # This file defines environment variables, volumes, and test filters for TE tests. | ||
|
|
||
| hardware_name: cuda | ||
| display_name: 'NVIDIA CUDA (A100)' | ||
|
|
||
| # CI image for online env | ||
| ci_image: harbor.baai.ac.cn/flagscale/cuda12.8.1-torch2.7.1-python3.10-te2.9:20260209 | ||
|
|
||
| # Runner labels for self-hosted A100 node | ||
| # runner_labels: | ||
| # - self-hosted | ||
| # - Linux | ||
| # - X64 | ||
| # - nvidia | ||
| # - gpu-8 | ||
|
|
||
| # Runner labels for online env | ||
| runner_labels: | ||
| - nv-8g-cicd-te | ||
|
|
||
| # Container volumes | ||
| container_volumes: | ||
| - /home/flagscale_cicd/flask/static:/workspace/report | ||
|
|
||
| # Container options | ||
| container_options: >- | ||
| --privileged | ||
| --gpus all | ||
| --shm-size=500g | ||
| --ipc=host | ||
| --ulimit memlock=-1 | ||
| --ulimit stack=67108864 | ||
| --user root | ||
|
|
||
| # Platform-specific environment setup script | ||
| setup_script: .github/scripts/setup_cuda.sh | ||
|
|
||
| # Build environment variables (platform-specific) | ||
| build_env: | ||
| TE_FL_SKIP_CUDA: '0' | ||
| SKIP_CUDA_BUILD: '0' | ||
| NVTE_WITH_CUDA: '1' | ||
| NVTE_WITH_MACA: '0' | ||
| TE_WITH_NCCL: '1' | ||
| NVTE_FRAMEWORK: pytorch | ||
| CUDA_HOME: /usr/local/cuda-12.8 | ||
| NVCC: /usr/local/cuda-12.8/bin/nvcc | ||
|
|
||
| # Device types to run tests on | ||
| device_types: | ||
| - a100 | ||
|
|
||
| # Test matrix configuration | ||
| test_matrix: | ||
| l0_pytorch: | ||
| path: 'qa/L0_pytorch_unittest/test.sh' | ||
| ignored_tests: | ||
| - test_sanity_layernorm_mlp | ||
| - test_sanity_gpt | ||
| - test_sanity_bert | ||
| - test_sanity_T5 | ||
| - test_sanity_amp_and_nvfuser | ||
| - test_sanity_drop_path | ||
| - test_layernorm_mlp_accuracy | ||
| - test_grouped_linear_accuracy | ||
| - test_gpt_accuracy | ||
| - test_basic_linear | ||
| - test_layer_norm |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| # Metax Hardware Configuration for TE-FL | ||
| # This file defines CI/CD settings for Metax-based testing | ||
| # This file defines environment variables, volumes, and test filters for TE tests. | ||
|
|
||
| hardware_name: metax | ||
| display_name: 'Metax Tests' | ||
|
|
||
| # CI image for Metax dev env | ||
| # ci_image: localhost:5000/megatron-lm-with-te:v1 | ||
|
|
||
| # CI image for online env | ||
| ci_image: harbor.baai.ac.cn/flagscale/megatron-lm-with-te:202603231839 | ||
|
|
||
| # Runner labels for self-hosted Metax node | ||
| # runner_labels: | ||
| # - self-hosted | ||
| # - Linux | ||
| # - X64 | ||
| # - metax | ||
| # - dev | ||
|
|
||
| # Runner labels for online env | ||
| runner_labels: | ||
| - mx-4g-cicd-te | ||
|
|
||
| # Container volumes | ||
| container_volumes: | ||
| - /nfs/metax_fs:/nfs/metax_fs | ||
|
|
||
| # Container options | ||
| container_options: >- | ||
| --uts=host | ||
| --ipc=host | ||
| --privileged=true | ||
| --group-add video | ||
| --shm-size=100gb | ||
| --ulimit memlock=-1 | ||
| --user root | ||
| --ulimit nofile=65535:65535 | ||
| -e PLATFORM=metax | ||
| -e TORCH_DISTRIBUTED_BACKEND=mccl | ||
| -e LD_LIBRARY_PATH=/opt/maca/lib:/usr/local/lib:$LD_LIBRARY_PATH | ||
|
|
||
| # Platform-specific environment setup script | ||
| setup_script: .github/scripts/setup_metax.sh | ||
|
|
||
| # Build environment variables (platform-specific) | ||
| build_env: | ||
| TE_FL_SKIP_CUDA: '1' | ||
| NVTE_WITH_MACA: '1' | ||
| CUDA_HOME: /opt/maca | ||
| MACA_HOME: /opt/maca | ||
|
|
||
| # Device types to run tests on | ||
| device_types: | ||
| - c500 | ||
|
|
||
| # Test matrix configuration | ||
| test_matrix: | ||
| unit: | ||
| devices: | ||
| - c500 | ||
| # Ignored test files for unit tests | ||
| # These files will be skipped when running pytest | ||
| ignored_tests: | ||
| # example: tests/unit_tests/test_example.py | ||
| # - tests/unit_tests/test_inference.py | ||
| # - tests/unit_tests/test_rl_utils.py |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| # Configuration Template | ||
| # This file describes the structure for hardware-specific configurations. | ||
| # | ||
| # Fields: | ||
| # - image: Docker image to use for the runner | ||
| # - labels: List of labels for the runner | ||
| # - docker_options: Additional Docker options for mounting devices, volumes, etc. | ||
| # | ||
| # Example: | ||
| # image: <docker_image> | ||
| # labels: | ||
| # - <label1> | ||
| # - <label2> | ||
| # docker_options: | | ||
| # --option1 value1 | ||
| # --option2 value2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| #!/usr/bin/env bash | ||
| # CUDA Platform Environment Setup Script | ||
| # Called by unit_tests_common.yml for CUDA platforms (A100, H100, etc.) | ||
| set -euo pipefail | ||
|
|
||
| echo "===== Step 0: Activate Python environment =====" | ||
| source /opt/miniconda3/etc/profile.d/conda.sh | ||
| conda activate flagscale-train | ||
| echo "PATH=$PATH" >> $GITHUB_ENV | ||
| echo "Python: $(which python3) ($(python3 --version 2>&1))" | ||
|
|
||
| echo "===== Step 1: Remove Existing TransformerEngine =====" | ||
| pip uninstall transformer_engine transformer_engine_torch -y || true | ||
|
|
||
| echo "===== Step 2: Build & Install TransformerEngine =====" | ||
| cd $GITHUB_WORKSPACE | ||
|
|
||
| pip install nvdlfw-inspect --quiet | ||
| pip install expecttest --quiet | ||
| pip install . -v --no-deps --no-build-isolation | ||
|
|
||
| echo "===== Step 3: Verify Installation =====" | ||
| python3 tests/pytorch/test_sanity_import.py | ||
|
|
||
| echo "===== Environment Setup Complete =====" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| #!/usr/bin/env bash | ||
| # Metax Platform Environment Setup Script | ||
| # Called by unit_tests_common.yml for Metax platforms (C500, etc.) | ||
| set -euo pipefail | ||
|
|
||
| echo "===== Step 0: Activate Python environment =====" | ||
| source /opt/conda/etc/profile.d/conda.sh | ||
| conda activate base | ||
| echo "PATH=$PATH" >> $GITHUB_ENV | ||
| echo "Python: $(which python3) ($(python3 --version 2>&1))" | ||
|
|
||
| echo "===== Step 1: Base Environment Setup =====" | ||
| # Configure MACA toolchain paths | ||
| export PATH=/opt/maca/bin:$PATH | ||
| export LD_LIBRARY_PATH=/opt/maca/lib:$LD_LIBRARY_PATH | ||
| service ssh restart | ||
|
|
||
| echo "===== Step 2: Create nvcc Symlink (cucc -> nvcc) =====" | ||
| # TransformerEngine expects nvcc, but MACA provides cucc | ||
| ln -sf /opt/maca/tools/cu-bridge/bin/cucc /opt/maca/tools/cu-bridge/bin/nvcc | ||
| which nvcc || true | ||
|
|
||
| echo "===== Step 3: Install Required System Tools =====" | ||
| # Use apt to install git, curl | ||
| sed -i 's|http://mirrors.aliyun.com/ubuntu|http://archive.ubuntu.com/ubuntu|g' /etc/apt/sources.list | ||
| apt-get update -qq || true | ||
| apt-get install -y -qq git curl | ||
| # Install cmake and ninja via pip (more reliable than apt in this env) | ||
| python3 -m pip install cmake ninja torch --no-cache-dir | ||
|
|
||
| echo "===== Step 4: Remove Existing TransformerEngine =====" | ||
| # Prevent conflicts with preinstalled or incompatible versions | ||
| python3 -m pip uninstall transformer_engine -y || true | ||
| python3 -m pip install nvdlfw-inspect --no-deps || true | ||
|
|
||
| echo "===== Step 5: Install TE-FL Plugin Layer =====" | ||
| # Install TransformerEngine-FL Python layer (plugin logic) | ||
| cd $GITHUB_WORKSPACE | ||
| TE_FL_SKIP_CUDA=1 python3 setup.py install | ||
|
|
||
| echo "===== Step 6: Final Verification =====" | ||
| # Verify both TE Python API and backend are functional | ||
| python3 - <<'EOF' | ||
| import transformer_engine | ||
| import transformer_engine_torch as te | ||
| print("transformer_engine:", transformer_engine) | ||
| print("transformer_engine_torch:", te) | ||
| EOF | ||
|
|
||
| echo "===== Environment Setup Complete =====" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| name: ascend_tests | ||
|
|
||
| on: | ||
| # push: | ||
| # branches: ["main"] | ||
| # pull_request: | ||
| # branches: ["main"] | ||
| workflow_dispatch: | ||
|
|
||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-${{ github.actor }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| run_tests: | ||
| # Package manager and environment settings are read from .github/configs/ascend.yml | ||
| uses: ./.github/workflows/all_tests_common.yml | ||
| with: | ||
| platform: ascend | ||
|
|
||
| all_tests: | ||
| needs: run_tests | ||
| runs-on: ubuntu-latest | ||
| if: always() | ||
| steps: | ||
| - name: Verify workflow status | ||
| run: | | ||
| if [ "${{ needs.run_tests.result }}" != "success" ]; then | ||
| echo "❌ Tests workflow failed" | ||
| exit 1 | ||
| fi | ||
| echo "✅ All tests passed!" |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This configuration file is not derived from the template that used by cuda.yml and meta.yml