TE-FL Upgrade: Synchronization with TE Release V2.14#59
Open
lxd-cumt wants to merge 61 commits into
Open
Conversation
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
…A#2274) * Fix imports in test for deprecated jax.experimental.pjit Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix: Pass NamedSharding instead of PartitionSpec to compare_ops() so that when the in and out sharding is used to create a jitted function, it has the mesh info Signed-off-by: Kshitij Janardan Lakhani <klakhani@login-eos01.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> Signed-off-by: Kshitij Janardan Lakhani <klakhani@login-eos01.eos.clusters.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kshitij Janardan Lakhani <klakhani@login-eos01.eos.clusters.nvidia.com>
* Support wheel build for cuda 13 Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes for cu13 runtime, format Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add documentation Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better error handling Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix jax sdist Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Modify function names Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
…tization (NVIDIA#2270) * [JAX] Support recipe flags for disabling SR, RHT, and 2D quantization Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * lint Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * Fix issue with SR state being erased due to pytree handling of NVFP4Quantizer Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * Add test for SR state preservation across VJP boundaries Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * Fix sharding of SR rng state Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * lint Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * update tolerances slightly now that SR is enabled Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * lint Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * Use hashlib for deterministic hashes across runs for SR Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * rename uses_rht on scaled tensors to has_applied_rht Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * add assert Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * Move decision of whether to use RHT into helper.py and add dedicated RHT tests Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * lint Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * fix use_rht attr usage Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * fix pure-jax rht usage criteria Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * Adjust tolerances after rebase Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>
Include TE core headers in build Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* Added sm_120f to the build Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Change the arch specific handling Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Support for CUDA<12.9 Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Moved through the rest of the files Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Common cases Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Remove pure 100 from the list Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * CMake changes, (not yet working) Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Do not pass the arch-specific thing from build_tools Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Moved some of the files to arch-specific compilation Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix and also changing the order of compilation to hopefully get the compilation time lower Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix for the files overwriting custom compile properties Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Actually make this whole thing work Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add space to the error message Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com> * Apply suggestions from code review Co-authored-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com> * Fixes from review Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Changing the naming to be more intuitive Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add missing cassert include for device-side asserts Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com>
* add max_score for fused/unfused F16 non-CP Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * calculate max per head instead of max over all heads Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix fused attn max_score shape Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert FE to github Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FE to 1.15.0-rc Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix merge Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reduce ew kernels; fix causal masks; add more tests Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix to tests Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove logic for flash-attn Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: add CP support for p2p/a2a/all_gather Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor improvements of implementation/tests Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: add thd support Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add thd to UnfusedDPA Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * more fixes for lint Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update to FE 1.15 Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove unneeded changes Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable unfused for thd + pad_between_seqs Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable thd for unfused until bug is fixed Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix all_gather Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix all gather Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * rename max_score to max_logit Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix all_gather Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix all_gather Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable fused attn + thd Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…DIA#2288) * Fix CI failures due to deterministic attention Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * some more cleanup Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix debug test Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
…2315) * Fix: Skip determinism tests for bprop for all sm >=100 Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> * Add username to TODO Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> * Assert in fused attn bwd pass for sm100+ Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix attention backend and tests for sm120 Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Disable MLA only for backward Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
…determinism on Blackwell (NVIDIA#2316) * Bump the min version expected to supported FP8 cs det on Blackwell Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> * Disable fused attn for cudnn < 9.14 for FP8 CS. Disable fused attn for cudnn < 9.18 for FP8 deterministic CS Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…NVIDIA#2322) Ensure JAX reference impl uses an accurate backend Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>
# Description Add the FlagOS multi-chip backend for TransformerEngine Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the documentation, either a fix or a new content) - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Infra/Build change - [ ] Code refactoring ## Changes Please list the changes introduced in this PR: - Change A - Change B # Checklist: - [ ] I have read and followed the [contributing guidelines](https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst) - [ ] The functionality is complete - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes --------- Co-authored-by: zhaoyinglia <ylzhao@baai.ac.cn>
# Description Fix import bugs Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the documentation, either a fix or a new content) - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Infra/Build change - [ ] Code refactoring ## Changes Please list the changes introduced in this PR: - Change A - Change B # Checklist: - [ ] I have read and followed the [contributing guidelines](https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst) - [ ] The functionality is complete - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes
# Description Please include a brief summary of the changes, relevant motivation and context. Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the documentation, either a fix or a new content) - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Infra/Build change - [ ] Code refactoring ## Changes Please list the changes introduced in this PR: - Change A - Change B # Checklist: - [ ] I have read and followed the [contributing guidelines](https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst) - [ ] The functionality is complete - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes
…lagos-ai#4) # TransformerEngine-FL Plugin System ## Overview This PR implements a comprehensive multi-backend plugin system for TransformerEngine-FL, enabling support for multiple hardware vendors (NVIDIA, AMD, Hygon, etc.) while maintaining full API compatibility with the original `transformer_engine_torch`. **Core Philosophy**: A plugin-based backend system that allows hardware vendors to easily implement their own operator optimizations while preserving complete compatibility with the original TransformerEngine API. ## Key Features ### Full API Compatibility - Drop-in replacement for `transformer_engine_torch` - Switch backends via environment variables - Zero changes required to existing user code ### Multi-Backend Support | Backend | Description | Implementation | |---------|-------------|----------------| | **FlagOS (default)** | Triton-based cross-platform implementation | `backends/flagos/` | | **CUDA (vendor)** | Wraps original TransformerEngine C++ extensions | `backends/vendor/cuda/` | | **Reference** | Pure PyTorch fallback implementation | `backends/reference/` | ### Three-Tier Backend Selection ``` ┌─────────────────────────────────────────────────────────┐ │ 1. TE_FL_PER_OP (Per-operator override) [Highest] │ │ Example: TE_FL_PER_OP="rmsnorm_fwd=vendor:cuda" │ ├─────────────────────────────────────────────────────────┤ │ 2. TE_FL_PREFER (Global preference) │ │ Values: flagos / vendor / reference │ ├─────────────────────────────────────────────────────────┤ │ 3. Backend Priority (Intrinsic) [Lowest] │ │ Each implementation has a priority value │ └─────────────────────────────────────────────────────────┘ ``` ## Architecture ### Directory Structure ``` transformer_engine/plugin/core/ ├── __init__.py # Public API exports ├── types.py # Core types: BackendImplKind, OpImpl ├── registry.py # OpRegistry: stores all implementations ├── manager.py # OpManager: selects and calls implementations ├── policy.py # SelectionPolicy: backend selection rules ├── discovery.py # Plugin auto-discovery (entry_points, env) ├── builtin_ops.py # Registers all built-in backends ├── ops.py # TEFLModule: transformer_engine_torch compatible API ├── logger_manager.py # Logging utilities ├── _module_setup.py # Module aliasing setup ├── _build_config.py # Build-time configuration │ └── backends/ ├── flagos/ # FlagOS backend (Triton-based) │ ├── flagos.py # FlagOSBackend class │ ├── register_ops.py # Operator registration │ └── impl/ # Operator implementations │ ├── rmsnorm.py │ ├── gemm.py │ └── ... │ ├── vendor/ # Vendor backends │ └── cuda/ # NVIDIA CUDA backend │ ├── cuda.py # CUDABackend class │ └── register_ops.py │ └── reference/ # Reference backend (PyTorch) ├── reference.py # ReferenceBackend class ├── register_ops.py └── impl/ # Pure PyTorch implementations ``` ### Core Components | File | Description | |------|-------------| | `types.py` | Defines `BackendImplKind` (DEFAULT/VENDOR/REFERENCE) and `OpImpl` dataclass | | `registry.py` | `OpRegistry` - Central storage for all operator implementations | | `manager.py` | `OpManager` - Handles implementation selection, fallback, and execution | | `policy.py` | `SelectionPolicy` - Configurable rules for backend selection | | `discovery.py` | Auto-discovers plugins via `entry_points` or `TE_FL_PLUGIN_MODULES` | | `ops.py` | `TEFLModule` - Provides `transformer_engine_torch` compatible interface | ## Installation ### Build with CUDA support ```bash pip install --no-build-isolation -e . ``` ### Build without CUDA (FlagOS only) ```bash TE_FL_SKIP_CUDA=1 pip install --no-build-isolation -e . ``` ## Environment Variables ### Backend Selection | Variable | Description | Values | Default | |----------|-------------|--------|---------| | `TE_FL_PREFER` | Preferred backend type | `flagos` / `vendor` / `reference` | `flagos` | | `TE_FL_PREFER_VENDOR` | Prefer vendor (legacy) | `1` / `0` | `0` | | `TE_FL_STRICT` | Strict mode (no fallback) | `1` / `0` | `0` | ### Vendor Filtering | Variable | Description | Example | |----------|-------------|---------| | `TE_FL_ALLOW_VENDORS` | Allowed vendors (whitelist) | `nvidia,amd` | | `TE_FL_DENY_VENDORS` | Denied vendors (blacklist) | `vendor_a` | ### Per-Operator Configuration | Variable | Description | Example | |----------|-------------|---------| | `TE_FL_PER_OP` | Per-operator backend ordering | `rmsnorm_fwd=vendor:cuda\|default` | ### Plugin Discovery | Variable | Description | Example | |----------|-------------|---------| | `TE_FL_PLUGIN_MODULES` | Plugin modules to load | `my_plugin,another_plugin` | ### Build Configuration | Variable | Description | Values | Default | |----------|-------------|--------|---------| | `TE_FL_SKIP_CUDA` | Skip CUDA backend | `1` / `0` | `0` | | `CUDA_HOME` | CUDA installation path | `/usr/local/cuda` | Auto-detected | ### Logging | Variable | Description | Values | Default | |----------|-------------|--------|---------| | `TEFL_LOG_LEVEL` | Log level | `DEBUG` / `INFO` / `WARNING` / `ERROR` | `INFO` | ## Usage Examples ### Basic Usage (No Code Changes Required) ```python # Existing code works as-is import transformer_engine.pytorch as te # or import transformer_engine_torch as te ``` ### Register Custom Backend (In-tree) ```python from transformer_engine.plugin.core import ( OpRegistry, OpManager, OpImpl, BackendImplKind ) # 1. Define implementation def my_rmsnorm(input, weight, eps=1e-5, **kwargs): variance = input.pow(2).mean(-1, keepdim=True) return input * torch.rsqrt(variance + eps) * weight, torch.rsqrt(variance + eps) # 2. Register registry = OpRegistry() registry.register_impl(OpImpl( op_name="rmsnorm_fwd", impl_id="vendor.mybackend", kind=BackendImplKind.VENDOR, vendor="mybackend", fn=my_rmsnorm, priority=200, )) # 3. Call manager = OpManager(registry) output, rsigma = manager.call("rmsnorm_fwd", input, weight) ``` ### Register Custom Backend (Out-of-tree Plugin) Create a plugin package with `register(registry)` function: ```python # my_vendor_plugin/__init__.py from transformer_engine.plugin.core import OpImpl, BackendImplKind def my_rmsnorm(input, weight, eps=1e-5, **kwargs): # Your implementation ... def register(registry): """Called automatically by TE-FL""" registry.register_impl(OpImpl( op_name="rmsnorm_fwd", impl_id="vendor.myvendor", kind=BackendImplKind.VENDOR, vendor="myvendor", fn=my_rmsnorm, priority=200, )) ``` Load via environment variable: ```bash export TE_FL_PLUGIN_MODULES=my_vendor_plugin python your_script.py ``` ## Runtime Logs When running, you'll see logs indicating which backend is used: ``` [TE-FL manager.py:133 INFO] Registered impl_ids: ['default.flagos', 'reference.torch', 'vendor.cuda'] [TE-FL manager.py:390 INFO] Op 'rmsnorm_fwd' using 'default.flagos' (kind=default, vendor=None) [TE-FL manager.py:395 INFO] Op 'rmsnorm_fwd' switched from 'default.flagos' to 'vendor.cuda' (kind=vendor, vendor=CUDA) ``` ## Examples See `transformer_engine/plugins/examples/` for complete working examples: - `example_intree.py` - In-tree backend registration - `example_outtree.py` - Out-of-tree plugin registration Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the documentation, either a fix or a new content) - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Infra/Build change - [ ] Code refactoring ## Changes Please list the changes introduced in this PR: - Change A - Change B # Checklist: - [ ] I have read and followed the [contributing guidelines](https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst) - [ ] The functionality is complete - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes --------- Co-authored-by: panpy <panpy@sugon.com>
# Description - Add missing __init__.py files to transformer_engine/plugin/core/backends/flagos/attention/ directory tree to fix import errors when accessing these modules as Python packages - Add comprehensive test suite (test_policy.py) covering the TE-FL scheduling policy system including: SelectionPolicy creation and configuration Environment variable parsing (TE_FL_PREFER, TE_FL_STRICT, etc.) Policy context managers Vendor filtering (allow/deny) Thread safety validation Minor code style improvements Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the documentation, either a fix or a new content) - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Infra/Build change - [ ] Code refactoring ## Changes Please list the changes introduced in this PR: - Change A - Change B # Checklist: - [ ] I have read and followed the [contributing guidelines](https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst) - [ ] The functionality is complete - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes
…n fallback (flagos-ai#14) ## Summary This PR contains two major improvements: 1. **Register `get_attention_backend` function for all backends** (CUDA, FlagOS, Reference) - Added `get_attention_backend` implementation to all backend types - Ensures consistent attention backend selection across different hardware platforms 2. **Fix FlashAttention fallback mechanism** - Removed redundant `_called_impls` dictionary, replaced with simpler `_last_impl_id` class variable - Removed unused `_log_lock` threading lock - Simplified implementation tracking and logging logic - Reduced code complexity and memory overhead while maintaining full functionality ## Changes - Updated `FlashAttentionBase` class in `ops.py` to remove redundant implementation tracking - Added `get_attention_backend` registration to CUDA, FlagOS, and Reference backends - Fixed fallback logic in attention backend selection ## Test Plan - [x] Code builds successfully - [x] Existing tests pass - [x] Manual testing with different backend configurations ## Related Issues Fixes issues with FlashAttention fallback and improves backend consistency.
# Description fix nv shared lib bug [CUDA] Import failed: No module named 'transformer_engine_torch_nv' Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the documentation, either a fix or a new content) - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Infra/Build change - [ ] Code refactoring ## Changes Please list the changes introduced in this PR: - Change A - Change B # Checklist: - [ ] I have read and followed the [contributing guidelines](https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst) - [ ] The functionality is complete - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes
# Description This pr add hygon backend for calling basic ops on hygon dcu. ## Type of change - [x] New feature (non-breaking change which adds functionality) ## Changes Please list the changes introduced in this PR: - Add a new `hygon` folder in `vendor` contains `__init__.py`, `hygon.py`, `register_ops.py` - Register hygon ops in `builtin_ops.py` # Requirements In order to use hygon backend, the following, the following requirements need to be met - The python package `transformer_engine_fl_hygon` needs to be installed # Checklist: - [ ] I have read and followed the [contributing guidelines](https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst) - [ ] The functionality is complete - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes --------- Signed-off-by: wenjh <wenjh@sugon.com>
…gos-ai#18) Add a flag that permanently enables flag_gems with a single switch, eliminating the need to call flag_gems.use_gems for every single operator. This removes significant registration overhead and improves end-to-end throughput. - When the flag is set, every operator’s implementation is forced to use flag_os/vendor; the default PyTorch reference backend is unavailable. - When the flag is not set, operators can freely switch among flag_os, vendor, and torch backends.
Unify the usage of the gems context - only enter or exit the context when switching between the flagos backend and the torch backend (or vice versa). - avoids the overhead of repeated enter/exit calls across multiple OPs.
## Summary - Support combined qkv_layout formats like `sbhd_sbhd_sbhd` by extracting the first part for layout conversion - Distinguish between standard 4D tensor format (sbhd/bshd) and true packed format (thd). For 4D tensors, directly convert layout like flagos backend does, instead of incorrectly trying to unpack ## Problem When using torch SDPA backend with `batch_size > 1`, the following error occurs: ``` ValueError: Unexpected 4D tensor shape torch.Size([4096, 4, 16, 128]). Expected [total_tokens, 1, num_heads, head_dim] ``` The original code incorrectly tried to unpack 4D tensors when `cu_seqlens` was provided, but 4D tensors in `sbhd`/`bshd` format should be handled with simple layout conversion (like flagos backend does). ## Test plan - [x] Tested with batch_size=4, verified no ValueError - [x] Results match flagos backend output
- Remove the flag_gems.use_gems() context to avoid context-switching overhead - Call flag_gems.xxx directly wherever possible.
# Description Add the new vendor backend METAX ## Type of change - [ ] New feature (non-breaking change which adds functionality) ## Changes Please list the changes introduced in this PR: - Add metax ops register - Add metax backend implementation - Register metax ops in builtin_ops.py ## Requirements - The module transformer_engine_torch_metax is needed, to use this module, need to install package transformer_engine_metax # Checklist: - [x] I have read and followed the [contributing guidelines](https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst) - [x] The functionality is complete - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] New and existing unit tests pass locally with my changes
…lagos-ai#23) ## Summary - flagos: Add multi_tensor_adam_param_remainder implementation - reference: Add multi_tensor_adam_param_remainder implementation - reference: Add context parallel support for Flash Attention - manager: Add cache mechanism with _impl_cache and _impl_cache_meta for conditional op selection ## Changes ### flagos backend - Implemented multi_tensor_adam_param_remainder operation for handling parameter remainders in multi-tensor Adam optimizer ### reference backend - Implemented multi_tensor_adam_param_remainder operation - Added context parallel support for Flash Attention implementation ### Core manager - Added cache mechanism using _impl_cache and _impl_cache_meta - Improved op selection with conditional caching based on policy fingerprint and epoch --------- Signed-off-by: wenone766 <wenone766@wenobug.com> Co-authored-by: wenone766 <wenone766@wenobug.com>
- Fix enum mismatch, between ```transformer_engine/plugin/core/ops.py``` and ```transformer_engine/common/include/transformer_engine/xxx.h```
# Description add Vendor KUNLUNXIN
Fix kunlunxin register errors
# Description Validate TE QA test cases with new CI workflows ## Type of change - [ ] Documentation change (change only to the documentation, either a fix or a new content) - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [x] Infra/Build change - [ ] Code refactoring ## Changes Please list the changes introduced in this PR: - Added code inspection and PyTorch/C++ unit tests to improve the TE testing system - Implemented end-to-end automation of TE wheel package building, installation, and verification, supporting multiple versions of Flash Attention and GPUs with different CUDA architectures - Verified TE's core functions (distributed communication, matrix multiplication, ONNX export) and compatibility with Megatron-LM/Lightning-Thunder - Completed the verification of the nvinspect debugging tool and re-verification of core numerical tests # Checklist: - [ ] I have read and followed the [contributing guidelines](https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst) - [ ] The functionality is complete - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes --------- Co-authored-by: zihugithub <fbye@baai.ac.cn> Co-authored-by: liyuzhuo <lee.yuzhuo233@gamil.com>
…os-ai#36) ## Summary Refactor and improve the FlagOS optimizer and multi_tensor implementations to better match CUDA behavior and improve code quality. ## Changes ### `fused_adam.py` (FlagOS backend) - Remove unused `inv_scale` and `out_dtype` parameters from `multi_tensor_adam_fl` - `multi_tensor_adam_param_remainder_fl`: rewrite FP32 master weight reconstruction using bit manipulation (int16 high/low bits), matching the CUDA implementation exactly ### `multi_tensor.py` (FlagOS backend) - `multi_tensor_l2_norm_fl`: add proper type hints, noop_flag check, inf/nan detection, and replace raw `**` / `+` operators with `flag_gems.mul` / `flag_gems.add` - `multi_tensor_scale_fl`: add type hints, noop_flag check, inf/nan detection, and replace `src * scale` with `flag_gems.mul(src, scale)` ### `optimizer.py` (reference backend) - Update `multi_tensor_l2norm_torch` and `multi_tensor_adam_torch` to match new signatures and CUDA behavior (L2 vs AdamW mode split) - Rewrite `multi_tensor_adam_param_remainder_torch` with bit manipulation matching CUDA - Rename `eps` → `epsilon` for consistency ### `optimizers/__init__.py` - Export `multi_tensor_scale` and `multi_tensor_l2norm` ### Misc - Fix missing newline at end of files
# Description Add Musa backend
TE-FL Python-level now supports multiple platforms, including the
following two changes:
1. support for vendor-specific patches: vendors can now add their own
patches, e.g., patching ```torch.cuda``` to ```torch.musa```. For patch
implementation, please refer to
```transformer_engine/plugin/core/backends/vendor/musa/musa_patches.py```;
for patch integration, please refer to
```transformer_engine/__init__.py```.
2. abstraction of CUDA device references: files under
```transformer_engine/``` now abstract CUDA device-related code into
```te_device_type```. For example, ```torch.device("cuda")``` is now
replaced with ```torch.device(te_device_type)```.
3. Fix
- FlagOS Backend: ```get_num_cublas_stream``` and
```get_cudnn_version```
- Reference Backend: ```get_num_cublas_stream``` and
```scaled_mask_softmax_forward```
…-ai#52) Add two functions for flagos backend, based on flaggems - scaled_masked_softmax_forward - scaled_masked_softmax_backend
- Fix quantizer dtype attr conversion errors for vendor backends - Polish logger for vendor backend
- add ```te_general_grouped_gemm``` op for flagos backend, base on flag_gems - support both forward and backward computation, distinguished by ```grad```
# Description
This PR implements and integrates the **Metax (MACA)** workflow into
TransformerEngine-FL. It enables automated CI/CD pipelines, functional
training tests, and unit tests specifically optimized for Metax hardware
environments.
**Key updates in this version:** Successful TE compilation on Metax and
alignment with NVIDIA's standard QA workflows.
Fixes # (issue_number_if_applicable)
## Type of change
- [x] New feature (non-breaking change which adds functionality)
- [x] Infra/Build change (changes to CI/CD workflows or build scripts)
- [ ] Documentation change
- [ ] Bug fix
- [ ] Code refactoring
## Changes
### 1. Build & Compilation
- **TE Build Completion**: Successfully completed the compilation and
build process for TransformerEngine on the Metax platform.
- **Workflow Alignment**: Designed the Metax testing workflow based on
NVIDIA's `qa-l0-te-cpp-unittest-pytorch-lint` standard to ensure parity
with upstream quality gates.
### 2. CI/CD Infrastructure & Test Modules
- **Metax Platform Support**: Added `configs/metax.yml` to define
Metax-specific runner labels, images, and device configurations.
- **Verified Workflow Modules**: The following modules have been
implemented and verified on the Metax platform:
- **pytorch-lint**: Static code analysis and linting.
- **pytorch-debug**: Debug-level build and basic functional
verification.
- **pytorch-unittest**: Core unit testing for Metax-adapted operators.
- **Workflow Modularization**:
- Introduced `configs/all_tests_common.yml` and
`configs/unit_tests_common.yml` for reusable test logic.
- Added `configs/all_tests_metax.yml` as the dedicated entry point for
Metax functional testing.
### 3. Environment & Runtime Fixes
- **Image Management**: Implemented `image-pull-policy: never` and
`--pull never` options to force the use of local registry images
(localhost:5000), optimizing startup time in local cluster environments.
- **Dynamic Resource Scaling**:
- Adapted `torchrun` and training scripts to support dynamic
GPU/Accelerator counts (specifically for C500 clusters).
- Removed hardcoded GPU host configurations to improve portability
across different Metax nodes.
### 4. Cleanup
- Removed legacy CUDA/Ascend specific configurations from the Metax
workflow path to prevent environment contamination.
## Hardware/Environment Verified
- **Platform**: Metax MACA
- **Accelerator**: C500
- **Registry**: Local Registry (localhost:5000)
---
## TODO / Next Steps
- [ ] Integrate the Metax-specific adaptation workflow into the central
platform.
- [ ] Generate and upload comprehensive Benchmark and Performance test
reports.
# Checklist:
- [x] I have read and followed the contributing guidelines.
- [x] The functionality is complete and verified on Metax hardware.
- [x] I have commented my code, particularly in hardware-specific
adaptation areas.
- [x] My changes generate no new warnings.
- [x] I have added/updated tests that prove my feature works on the MACA
platform.
- [x] New and existing unit tests (Lint, Debug, Unittest) pass locally
with Metax environment.
---------
Co-authored-by: 爱洗澡 qq <aixizaoqq@aixizaodeMacBook-Air.local>
Co-authored-by: zhoujiamei <2867770387@qq.com>
Co-authored-by: zhoujiamei <zjm>
Co-authored-by: peiyu <peiyu@jinglong.ai>
… FlagCICD runner (flagos-ai#58) # Description Simplifies and consolidates the coverage report generation logic in the CI unittest workflow, reducing redundant steps and dependencies. Need to test **uploading reports to FlagCICD step** in CI env. ## Type of change - [x] New feature (non-breaking change which adds functionality) - [x] Infra/Build change (changes to CI/CD workflows or build scripts) - [x] Code refactoring - [ ] Documentation change - [ ] Bug fix - [ ] Breaking change ## Changes - Merged `Generate Coverage Report` into the `Execute Tests` step — coverage `combine` and `json` generation now run inline after `bash test.sh`, following the same pattern as Megatron-LM-FL - Coverage collection is gated on `test_type == 'unittest'` to avoid running for lint/debug groups, and `pip install` is done only once - Removed `fetch-depth: 0` from checkout steps (not required for unit test runs) - Removed unused/leftover scripts from the repository ## TODO # Checklist: - [x] I have read and followed the contributing guidelines. - [x] The functionality is complete - [x] I have commented my code, particularly in coverage report uploading steps - [x] My changes generate no new warnings - [x] I have added/updated tests that prove my feature works on Cuda and Metax platform. - [x] New and existing unit tests pass locally on Cuda and Metax platform.
Resolved 28 merge conflicts: - P0 (20 files): transformer_engine/pytorch/ — preserved te_device_type() device abstraction and plugin system hooks - P1 (1 file): transformer_engine/common/__init__.py — preserved plugin bootstrapping and skip_cuda_build() - P2 (7 files): CI/CD, qa, config — preserved fork-specific CI and test harness
Updated plugin OP API layer to match pytorch/csrc/ pybind changes between base and dev branches. Changes applied to: - ops.py base class (TEFLBackendBase) - All 5 vendor backends (cuda, iluvatar, metax, musa, hygon) - All 5 vendor register_ops.py files - Scanned flagos/reference backends for changed interfaces (no changes needed) New APIs added: group_quantize, bgrad_group_quantize, glu, dglu, te_general_grouped_gemm_for_grouped_tensor, te_general_grouped_gemm_for_discrete_in, te_general_grouped_gemm_for_discrete_out, nvfp4_data_transpose, swizzle_scales_for_gemm_, grouped_swizzle_for_gemm, convert_host_pointers_to_tensor, get_device_pointer_for_data_and_scales, splits_to_offsets, mxfp8_scaling_compute_partial_amax, mxfp8_scaling_partial_cast, nvfp4_2d_compute_partial_amax, nvfp4_multi_tensor_compute_partial_amax, nvfp4_compute_global_scale, nvfp4_compute_per_block_scale, nvfp4_expand_scale_to_fp8, nvfp4_fused_scale, nvfp4_multi_tensor_fused_scale, nvfp4_2d_partial_cast, nvfp4_multi_tensor_2d_partial_cast, nvfp4_2d_multi_tensor_transpose, multi_tensor_scale_tensor, multi_tensor_compute_scale_inv_e8m0 Modified APIs: split_quantize (added disable_bulk_allocation param)
…ype()
Scanned Python-layer diff (base..dev, excluding csrc) for newly introduced
hardcoded 'cuda' device strings. Replaced 11 instances across 7 files:
- device=torch.device('cuda') → device=torch.device(te_device_type()): 3
- device='cuda' → device=te_device_type(): 1
- .device.type == 'cuda' → .device.type == te_device_type(): 2
- get_autocast_dtype('cuda') → get_autocast_dtype(te_device_type()): 5
Skipped 10 intentional default parameter values and docstrings.
torch.cuda.* API calls left as-is (handled by vendor patches.py at runtime).
Scanned fork-specific code (new in merge vs dev) for references to
functions, classes, and file paths that upstream renamed or relocated
between base and dev. Fixed 6 stale reference(s):
- _load_cudnn() → _load_cuda_library("cudnn")
- _load_nvrtc() → _load_cuda_library("nvrtc")
- _load_curand() → _load_cuda_library("curand")
- _load_nvidia_cuda_library("cublas"/"cuda_runtime") → _load_cuda_library_from_python()
- tensor.quantized_tensor → quantized_tensor (pytorch/utils.py)
- tensor.quantized_tensor → quantized_tensor (flagos backends.py)
Updated plugin OP API layer to match pytorch/csrc/ pybind changes between base and dev branches. Changes applied to: - ops.py base class (TEFLBackendBase): added cuda_graph, deterministic to get_fused_attn_backend - ops.py FlashAttentionBase: added num_splits to forward/_forward_impl signatures - All vendor FlashAttention subclasses (cuda, hygon, metax, musa, kunlunxin) - All 5 vendor backends get_fused_attn_backend (cuda, iluvatar, metax, musa, hygon) - Reference and flagos backends updated for both APIs - Verified get_attention_backend/AttentionParams pass-through (no changes needed) See /tmp/plugin_api_changes.log for details.
…_attn_fwd/bwd Found during batch validation combo 2/9 (te_fl_prefer=vendor, attention_backend=fused, attempt 1). Error: CUDABackend.fused_attn_fwd() takes 29 positional arguments but 31 were given Root cause: upstream merge added bottom_right_diagonal and cuda_graph params to the caller (cpp_extensions/fused_attn.py) but the plugin backend signatures were not updated. Fix: added both params to ops.py base class, CUDA backend, and all vendor backends (musa, iluvatar, hygon, metax) for both fused_attn_fwd and fused_attn_bwd.
…led() in flagos backend Found during batch validation combo 4/9 (te_fl_prefer=flagos, attention_backend=flash, attempt 1). Error: Cached implementation 'default.flagos' failed for op 'get_flash_attention_class': cannot import name 'CPUOffloadEnabled' from 'transformer_engine.pytorch.cpu_offload' Root cause: upstream removed CPUOffloadEnabled from cpu_offload.py (v2 API), replacing it with is_cpu_offload_enabled() function. Fix: updated flagos backend to use the new function.
|
wendell seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
xmhubj
reviewed
Apr 15, 2026
Collaborator
There was a problem hiding this comment.
This configuration file is not derived from the template that used by cuda.yml and meta.yml
…metax runner (flagos-ai#60) ## Description Refactors CI/CD workflows to support both CUDA (NVIDIA A100) and Metax (C500) platforms, removes obsolete workflows, and fixes several platform-specific test failures. Add functional testing, and log reporting, with significant workflow simplification, and Metax platform use BAAI runner configs. --- ## Type of change - [x] New feature (non-breaking change which adds functionality) - [x] Infra/Build change (changes to CI/CD workflows or build scripts) - [x] Code refactoring - [x] Bug fix - [ ] Documentation change - [ ] Breaking change --- ### Changes - **Workflow cleanup**: Removed 7 obsolete workflows; extracted lint into a standalone reusable `lint_common.yml` (runs in parallel); add `integration_tests_common.yml` - **Platform refactoring**: Added per-platform setup scripts (`setup_cuda.sh` / `setup_metax.sh`); switched Metax config to BAAI online environment; removed unsupported test types (JAX distributed) from Metax matrix - **Bug fixes**: - Metax: skip incompatible distributed test files (`test_numerics`, `test_torch_fsdp2`, etc.) to prevent `torchrun` SIGSEGV - Metax: replace `nvidia-smi`-only FP8 detection with platform-aware check - CUDA: fix `libcudart` load failure when runtime is pip-installed (add proper fallback chain in `_load_cudart()` and `try_load_lib`) --- ## Checklist - [x] I have read and followed the contributing guidelines - [x] The functionality is complete - [x] I have commented my code, particularly in CI workflow setup steps - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added/updated tests that prove my feature works on CUDA and Metax platform - [x] New and existing unit tests pass locally on CUDA and Metax platform --------- Co-authored-by: qqjxzxq <1376782660@qq.com> Co-authored-by: HermiaHuan <3081497279@qq.com>
…metax runner (flagos-ai#60) Refactors CI/CD workflows to support both CUDA (NVIDIA A100) and Metax (C500) platforms, removes obsolete workflows, and fixes several platform-specific test failures. Add functional testing, and log reporting, with significant workflow simplification, and Metax platform use BAAI runner configs. --- - [x] New feature (non-breaking change which adds functionality) - [x] Infra/Build change (changes to CI/CD workflows or build scripts) - [x] Code refactoring - [x] Bug fix - [ ] Documentation change - [ ] Breaking change --- - **Workflow cleanup**: Removed 7 obsolete workflows; extracted lint into a standalone reusable `lint_common.yml` (runs in parallel); add `integration_tests_common.yml` - **Platform refactoring**: Added per-platform setup scripts (`setup_cuda.sh` / `setup_metax.sh`); switched Metax config to BAAI online environment; removed unsupported test types (JAX distributed) from Metax matrix - **Bug fixes**: - Metax: skip incompatible distributed test files (`test_numerics`, `test_torch_fsdp2`, etc.) to prevent `torchrun` SIGSEGV - Metax: replace `nvidia-smi`-only FP8 detection with platform-aware check - CUDA: fix `libcudart` load failure when runtime is pip-installed (add proper fallback chain in `_load_cudart()` and `try_load_lib`) --- - [x] I have read and followed the contributing guidelines - [x] The functionality is complete - [x] I have commented my code, particularly in CI workflow setup steps - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added/updated tests that prove my feature works on CUDA and Metax platform - [x] New and existing unit tests pass locally on CUDA and Metax platform --------- Co-authored-by: qqjxzxq <1376782660@qq.com> Co-authored-by: HermiaHuan <3081497279@qq.com>
Tree replacement merge from merge/dev-to-main-20260410. Working tree is identical to the source branch. Stages 1-8 completed and verified.
- Remove unused imports in utils.py, multi_head_attention.py, float8_blockwise_tensor.py - Reorder imports to follow stdlib → third-party → first-party → local convention - Fixes CI lint failures while maintaining 10.00/10 pylint score Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nown type warnings
fe21a60 to
e5c8380
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Please include a brief summary of the changes, relevant motivation and context.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: