Skip to content

add kunlunxin vendor op#66

Open
sunge666-ui wants to merge 2 commits into
flagos-ai:mainfrom
sunge666-ui:main
Open

add kunlunxin vendor op#66
sunge666-ui wants to merge 2 commits into
flagos-ai:mainfrom
sunge666-ui:main

Conversation

@sunge666-ui
Copy link
Copy Markdown

Description

Add some kunlunxin ops bind code

Type of change

  • [1] New feature (non-breaking change which adds functionality)

Changes

Add kunlunxin backend bind support.
Add kunlunxin ops bind and register.

Checklist:

  • [1] I have read and followed the contributing guidelines
  • [1] The functionality is complete
  • [1] I have commented my code, particularly in hard-to-understand areas
  • [1] I have made corresponding changes to the documentation
  • [1] My changes generate no new warnings
  • [1] I have added tests that prove my fix is effective or that my feature works
  • [1] New and existing unit tests pass locally with my changes

inv_scale,
)

def cast_to_fp8(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the cast_to_fp8 op used in TransformerEngine?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fp8_utils.py module in Megatron-LM-FL requires the use of cast_to_fp8; specifically: from transformer_engine.pytorch.cpp_extensions import cast_to_fp8.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Megatron-LM-FL includes a version check for Transformer Engine (TE). For TE versions >= 2.0, QuantizedTensor is used directly; for versions between 1.0 and 2.0, cast_to_fp8 is used instead. Since TE-FL currently targets TE V2.9, the cast_to_fp8 path should no longer need to be retained, right?

megatron/core/fp8_utils.py
Image

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't upgraded to the latest version of TE yet, so for the time being, we can only bind this cast_to_fp8 function; the upgrade to the newer version of TE is still in progress.

ext = get_ext()

try:
import transformer_engine_klx_torch
Copy link
Copy Markdown

@aoyulong aoyulong May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually recommend using transformer_engine_klx_torch directly; it's better to call the kernels or operators provided by the vendor to avoid the overhead and the complexity of version control.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transformer_engine_klx_torch is currently an integral part of our kernel; its format was designed based on the conventions adopted by other vendors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants