Skip to content

[Bug] ExpandMixedKernel fails with "Tensor view not found" for V→C pattern in split=UP_DOWN block #965

@zhangqi-chen

Description

@zhangqi-chen

Component

Codegen

Description

When using pl.at(level=pl.Level.CORE_GROUP, split=pl.SplitMode.UP_DOWN) (or pl.incore(split=...)) in a block where vector ops write to a GM tensor via pl.assemble, and cube ops subsequently read from the same GM tensor via pl.slice for matmul input, the ExpandMixedKernel pass fails with:

Tensor view not found for parameter: mid__tile

The pattern is: vector writes GM tensor → cube reads same GM tensor within a single split block. The compiler does not generate a tensor view mapping for the GM tensor on the cube (AIC) side.

This is a separate issue from #963 (which involves slicing matmul output). Here the matmul input comes from a GM tensor that was just written by vector ops in the same block.

Steps to Reproduce

Minimal reproducer (examples/beginner/vc_mixed_test.py in pypto-lib):

import pypto.language as pl

M, K, N = 16, 128, 64

@pl.program
class VCTest:
    @pl.function(type=pl.FunctionType.Opaque)
    def vc_test(
        self,
        x: pl.Tensor[[M, K], pl.FP32],
        w: pl.Tensor[[K, N], pl.BF16],
        out: pl.Out[pl.Tensor[[M, N], pl.FP32]],
    ) -> pl.Tensor[[M, N], pl.FP32]:
        mid = pl.create_tensor([M, K], dtype=pl.BF16)
        with pl.at(level=pl.Level.CORE_GROUP, split=pl.SplitMode.UP_DOWN):
            x_tile = pl.slice(x, [M, K], [0, 0])
            scaled = pl.mul(x_tile, 0.5)
            scaled_bf16 = pl.cast(scaled, target_type=pl.BF16)
            mid = pl.assemble(mid, scaled_bf16, [0, 0])
            a = pl.slice(mid, [M, K], [0, 0])
            b = pl.slice(w, [K, N], [0, 0])
            c = pl.matmul(a, b, out_dtype=pl.FP32)
            out = pl.assemble(out, c, [0, 0])
        return out

Run: python vc_mixed_test.py -p a2a3

Note: Splitting into two separate pl.at() blocks (one for vector, one for cube) compiles and runs correctly.

Expected Behavior

The ExpandMixedKernel pass should handle the V→C GM tensor handoff within a single split block: vector side writes mid via assemble (store to GM), cube side reads mid via slice (load from GM). The kernel should compile successfully.

Actual Behavior

Failed to compile group 'vc_test_incore_0':
Tensor view not found for parameter: mid__tile

Git Commit ID

babf158

NPU Kind

Ascend 910C

Host Platform

Linux (aarch64)

Additional Context

  • Related: [Bug] ExpandMixedKernel drops matmul when output is sliced before vector consumption #963 (ExpandMixedKernel drops matmul when output is sliced — C→V direction)
  • This issue is the V→C direction: vector writes a GM tensor, cube reads it
  • pypto branch: feat/incore-split-param
  • pypto-lib commit: 8cfb7c0
  • Also encountered in qwen3_32b_decode_scope2.py when attempting to merge softmax (Stage 3, vector) + SV matmul (Stage 4, cube) into a single split block

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions