You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using pl.at(level=pl.Level.CORE_GROUP, split=pl.SplitMode.UP_DOWN) (or pl.incore(split=...)) in a block where vector ops write to a GM tensor via pl.assemble, and cube ops subsequently read from the same GM tensor via pl.slice for matmul input, the ExpandMixedKernel pass fails with:
Tensor view not found for parameter: mid__tile
The pattern is: vector writes GM tensor → cube reads same GM tensor within a single split block. The compiler does not generate a tensor view mapping for the GM tensor on the cube (AIC) side.
This is a separate issue from #963 (which involves slicing matmul output). Here the matmul input comes from a GM tensor that was just written by vector ops in the same block.
Steps to Reproduce
Minimal reproducer (examples/beginner/vc_mixed_test.py in pypto-lib):
Note: Splitting into two separate pl.at() blocks (one for vector, one for cube) compiles and runs correctly.
Expected Behavior
The ExpandMixedKernel pass should handle the V→C GM tensor handoff within a single split block: vector side writes mid via assemble (store to GM), cube side reads mid via slice (load from GM). The kernel should compile successfully.
Actual Behavior
Failed to compile group 'vc_test_incore_0':
Tensor view not found for parameter: mid__tile
This issue is the V→C direction: vector writes a GM tensor, cube reads it
pypto branch: feat/incore-split-param
pypto-lib commit: 8cfb7c0
Also encountered in qwen3_32b_decode_scope2.py when attempting to merge softmax (Stage 3, vector) + SV matmul (Stage 4, cube) into a single split block
Component
Codegen
Description
When using
pl.at(level=pl.Level.CORE_GROUP, split=pl.SplitMode.UP_DOWN)(orpl.incore(split=...)) in a block where vector ops write to a GM tensor viapl.assemble, and cube ops subsequently read from the same GM tensor viapl.slicefor matmul input, theExpandMixedKernelpass fails with:The pattern is: vector writes GM tensor → cube reads same GM tensor within a single split block. The compiler does not generate a tensor view mapping for the GM tensor on the cube (AIC) side.
This is a separate issue from #963 (which involves slicing matmul output). Here the matmul input comes from a GM tensor that was just written by vector ops in the same block.
Steps to Reproduce
Minimal reproducer (
examples/beginner/vc_mixed_test.pyin pypto-lib):Run:
python vc_mixed_test.py -p a2a3Note: Splitting into two separate
pl.at()blocks (one for vector, one for cube) compiles and runs correctly.Expected Behavior
The
ExpandMixedKernelpass should handle the V→C GM tensor handoff within a single split block: vector side writesmidvia assemble (store to GM), cube side readsmidvia slice (load from GM). The kernel should compile successfully.Actual Behavior
Git Commit ID
babf158
NPU Kind
Ascend 910C
Host Platform
Linux (aarch64)
Additional Context
feat/incore-split-param8cfb7c0qwen3_32b_decode_scope2.pywhen attempting to merge softmax (Stage 3, vector) + SV matmul (Stage 4, cube) into a single split block