Skip to content

feat(ir): Added tensor.expand_clone ops#851

Open
wuzhf9 wants to merge 1 commit intohw-native-sys:mainfrom
wuzhf9:issue679
Open

feat(ir): Added tensor.expand_clone ops#851
wuzhf9 wants to merge 1 commit intohw-native-sys:mainfrom
wuzhf9:issue679

Conversation

@wuzhf9
Copy link
Copy Markdown
Contributor

@wuzhf9 wuzhf9 commented Apr 2, 2026

Summary

  • add tensor.expand_clone operation

Testing

  • All tests pass
  • Code review completed
  • Documentation updated

Related Issues

Fixes #679

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds a new unified op expand_clone (tensor + tile), implements type inference and registrations for tensor.expand_clone and tile.expand_clone, centralizes index/shape helpers, registers a tensor→tile conversion, adds a SubstituteTiles IR pass that lowers tile.expand_clone to tile.create/tile.assemble (with optional loop for single-axis broadcast), and exposes/tests the feature in Python and unit tests.

Changes

Cohort / File(s) Summary
Build
CMakeLists.txt
Include src/ir/transforms/substitute_tiles_pass.cpp in build.
Docs
docs/en/dev/passes/00-pass_manager.md, docs/zh-cn/dev/passes/00-pass_manager.md
Document SubstituteTiles pass and insert it at start of PTO tile-stage pipeline; list required/produced properties.
Pass API & Properties
include/pypto/ir/transforms/passes.h, include/pypto/ir/transforms/pass_properties.h
Add pass::SubstituteTiles() declaration and kSubstituteTilesProperties.
Type-inference utilities (header)
include/pypto/ir/type_inference.h
Expose helpers: NormalizeAxis, ComputeShapeProduct, IsIndexLikeDtype, InferTileLayoutFromShape, ValidateIndexTupleElements.
Type-inference utilities (impl)
src/ir/op/type_inference.cpp
Implement the above helpers with bounds/checking and layout inference.
Tensor op: expand_clone
src/ir/op/tensor_ops/broadcast.cpp, python/pypto/ir/op/tensor_ops.py, tests/ut/ir/operators/test_tensor_ops.py
Add tensor.expand_clone op, type-deduction validating rank/broadcast rules, Python binding, and unit test.
Tile op: expand_clone
src/ir/op/tile_ops/broadcast.cpp, python/pypto/ir/op/tile_ops.py, tests/ut/ir/operators/test_tile_ops.py
Add tile.expand_clone op, type-deduction enforcing single-axis broadcast & layout inference, Python binding, and unit test.
Unified language API
python/pypto/language/op/unified_ops.py, python/pypto/language/op/tensor_ops.py, python/pypto/language/op/tile_ops.py, python/pypto/language/op/__init__.py, python/pypto/language/__init__.py
Add pl.expand_clone unified dispatcher and export it.
Python pass bindings & typing
python/bindings/modules/passes.cpp, python/pypto/pypto_core/passes.pyi
Expose passes.substitute_tiles binding and typing.
Pass manager
python/pypto/ir/pass_manager.py
Insert SubstituteTiles into tile_pto_passes before FlattenTileNdTo2D.
Op conversion registry
src/ir/transforms/op_conversion_registry.cpp
Register 1:1 conversion tensor.expand_clonetile.expand_clone.
Transform pass implementation
src/ir/transforms/substitute_tiles_pass.cpp
New SubstituteTiles pass that replaces tile.expand_clone with tile.create + tile.assemble and, if needed, an INDEX loop producing per-iteration offsets; enforces broadcast constraints and emits SSA-form statements.
Transform cleanup
src/ir/op/tensor_ops/transform.cpp, src/ir/op/tile_ops/transform.cpp
Remove duplicate local helpers; include centralized type_inference.h.
Tests for pass
tests/ut/ir/transforms/test_substitute_tiles.py
Add tests validating broadcast/no-broadcast lowering, presence/absence of tile.expand_clone, presence of tile.assemble, and structural equality.

Sequence Diagram

sequenceDiagram
    participant Client as IR Input
    participant PassMgr as Pass Manager
    participant SubPass as SubstituteTiles Pass
    participant Visitor as IR Visitor
    participant Builder as IR Builder
    participant Output as Transformed IR

    Client->>PassMgr: submit function with tile.expand_clone
    PassMgr->>SubPass: run SubstituteTiles
    activate SubPass

    SubPass->>Visitor: traverse statements
    activate Visitor

    loop for each tile.expand_clone call
        Visitor->>Visitor: validate call (arity, types, literal shape)
        Visitor->>Visitor: compute broadcast axis (≤1) using type-inference helpers
        Visitor->>Builder: request replacement IR
        activate Builder

        alt broadcast axis present
            Builder->>Builder: emit `tile.create` (dest)
            Builder->>Builder: emit `for i in 0..N` loop (INDEX)
            Builder->>Builder: per-iter offsets with loop var
            Builder->>Builder: emit `tile.assemble` inside loop, yield
        else no broadcast
            Builder->>Builder: emit `tile.create`
            Builder->>Builder: emit single `tile.assemble` with zeros offsets
        end

        Builder-->>Visitor: return replacement statements
        deactivate Builder
        Visitor->>Visitor: splice replacement into AST
    end

    Visitor-->>SubPass: traversal complete
    deactivate Visitor

    SubPass-->>Output: transformed IR (no tile.expand_clone)
    deactivate SubPass
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • Hzfengsy
  • lyfne123

"🐰
A hop, a clone, one axis to grow,
Tiles assemble in tidy row.
The pass rewrites with careful art,
One dimension plays its part,
Hooray — the IR gets a brand new start!"

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.62% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title describes adding 'tensor.expand_clone ops' but the changeset introduces both tensor and tile variants of expand_clone, plus a substitute_tiles pass that is central to the implementation. Consider updating the title to 'feat(ir): Add expand_clone ops and substitute_tiles pass' to more accurately reflect the full scope of changes and the key transformation pass.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR fully implements the requirements from issue #679: expand_clone operation with correct semantics, signature, tensor/tile variants, and substitute_tiles pass for IR transformation.
Out of Scope Changes check ✅ Passed All code changes are directly scoped to implementing expand_clone and substitute_tiles: new ops, passes, Python bindings, documentation, and type inference utilities are all in scope.
Description check ✅ Passed The PR description is directly related to the changeset, which adds expand_clone operations and the substitute_tiles pass as indicated in the summary.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the expand_clone operation for both tensors and tiles, allowing for data expansion via cloning rather than standard broadcasting. It includes the necessary IR operation registrations, Python bindings, and a new SubstituteTiles pass that lowers tile.expand_clone into a combination of tile.create, tile.assemble, and for loops. Feedback highlights a potential crash in the lowering pass when handling the optional valid_shape argument and suggests adding the NoNestedCalls property to the pass requirements to align with the current implementation's assumptions.

int temp_var_id_ = 0;

StmtPtr RewriteExpandClone(const CallPtr& call, const VarPtr& result_var, const Span& span) {
CHECK(call->args_.size() == 2) << "SubstituteTiles: tile.expand_clone expects 2 arguments";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The tile.expand_clone operation lowering currently only supports 2 arguments, but the tensor.expand_clone operation (which is converted to tile.expand_clone) supports an optional 3rd argument valid_shape. If a tensor operation with valid_shape is converted, this pass will crash. Please update the lowering to handle the optional 3rd argument or ensure it is handled during conversion.

Comment on lines +88 to +90
inline const PassProperties kSubstituteTilesProperties{
.required = {IRProperty::SSAForm, IRProperty::IncoreTileOps, IRProperty::NormalizedStmtStructure},
.produced = {IRProperty::SSAForm, IRProperty::IncoreTileOps, IRProperty::NormalizedStmtStructure}};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The SubstituteTiles pass implementation in substitute_tiles_pass.cpp only overrides VisitStmt_ and assumes that tile.expand_clone calls are not nested within other expressions. Therefore, this pass should explicitly require the NoNestedCalls property to ensure correctness.

Suggested change
inline const PassProperties kSubstituteTilesProperties{
.required = {IRProperty::SSAForm, IRProperty::IncoreTileOps, IRProperty::NormalizedStmtStructure},
.produced = {IRProperty::SSAForm, IRProperty::IncoreTileOps, IRProperty::NormalizedStmtStructure}};
inline const PassProperties kSubstituteTilesProperties{
.required = {IRProperty::SSAForm, IRProperty::IncoreTileOps, IRProperty::NormalizedStmtStructure, IRProperty::NoNestedCalls},
.produced = {IRProperty::SSAForm, IRProperty::IncoreTileOps, IRProperty::NormalizedStmtStructure}};

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
src/ir/op/tile_ops/broadcast.cpp (1)

153-164: Improve CHECK message formatting for readability.

Line 153, Line 158, and Line 163 miss a space after op_name, producing messages like tile.expand_clonerequires ....

✏️ Suggested tweak
-  CHECK(args.size() == 2) << op_name << "requires exactly 2 arguments (input, shape), but got "
+  CHECK(args.size() == 2) << op_name << " requires exactly 2 arguments (input, shape), but got "
                           << args.size();
...
-  CHECK(tile_type) << op_name << "requires first argument to be a TileType, but got "
+  CHECK(tile_type) << op_name << " requires first argument to be a TileType, but got "
                    << args[0]->GetType()->TypeName();
...
-  CHECK(shape_tuple_type) << op_name << "requires shape to be TupleType, but got "
+  CHECK(shape_tuple_type) << op_name << " requires shape to be TupleType, but got "
                           << args[1]->GetType()->TypeName();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/op/tile_ops/broadcast.cpp` around lines 153 - 164, The CHECK messages
concatenating op_name lack a separating space, producing messages like
"tile.expand_clonerequires..."; update the three CHECK calls that reference
op_name (the one asserting args.size()==2, the one validating tile_type via
As<TileType>, and the one validating shape_tuple_type via As<TupleType>) to
include a space after op_name in the formatted string (e.g., change '" <<
op_name << "requires' to '" << op_name << " requires"') so the operator name and
the rest of the message are separated for readability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/zh-cn/dev/passes/00-pass_manager.md`:
- Line 71: The table row for SubstituteTiles is missing NormalizedStmtStructure
which causes a mismatch with the implementation: update the table row for
"SubstituteTiles" so its input and output columns include
"NormalizedStmtStructure" to match the kSubstituteTilesProperties (which lists
NormalizedStmtStructure as required/produced); ensure the document's SSAForm,
IncoreTileOps entries reflect the same required/produced fields as the code.

In `@python/pypto/language/op/tensor_ops.py`:
- Around line 605-617: The public function expand_clone currently only accepts
shape but the IR op _ir_ops.expand_clone also supports an optional valid_shape;
update the expand_clone signature to accept valid_shape:
Optional[Sequence[IntLike]] = None, normalize it (e.g. via _normalize_intlike)
and pass it to _ir_ops.expand_clone alongside the normalized shape, and update
the docstring/type hints accordingly; keep existing use of input.unwrap() and
return Tensor(expr=call_expr).

In `@python/pypto/language/op/unified_ops.py`:
- Around line 298-304: The unified expand_clone currently only accepts shape so
callers cannot pass a tensor-level valid_shape; update the expand_clone
signature to accept an optional valid_shape (e.g. valid_shape:
Optional[Sequence[IntLike]] = None) and pass it through to the tensor
implementation (_tensor.expand_clone(input, shape, valid_shape)); keep calling
_tile.expand_clone(input, shape) for Tile or forward valid_shape if Tile
supports it. Also update the Type hints and the error message path in
expand_clone to reflect the new parameter so callers can supply valid_shape when
operating on Tensors.

In `@src/ir/op/tensor_ops/broadcast.cpp`:
- Around line 176-180: DeduceTensorExpandCloneType was changed to accept an
optional third valid_shape arg but downstream code (DeduceTileExpandCloneType
and SubstituteTiles::RewriteExpandClone) still expects exactly 2 args, causing a
mismatch; revert the tensor-level op to require exactly 2 arguments for now:
change the argument check in DeduceTensorExpandCloneType back to args.size()==2
and remove/document any mentions of a 3-arg signature (also apply the same
revert to the other two sites you noted around 246-252 and 359-369), or
alternatively fully thread the third argument through DeduceTileExpandCloneType
and SubstituteTiles::RewriteExpandClone before exposing valid_shape — reference
DeduceTileExpandCloneType and SubstituteTiles::RewriteExpandClone when making
the consistent change.

In `@src/ir/transforms/op_conversion_registry.cpp`:
- Line 152: The current RegisterSimple("tensor.expand_clone",
"tile.expand_clone") must be replaced with a custom conversion because
DeduceTensorExpandCloneType accepts 2 or 3 args while tile.expand_clone only
accepts 2; implement a RegisterCustom conversion for "tensor.expand_clone" that
inspects the Relay Call (or Expr) argument count: if args.size() == 2, lower to
a tile.expand_clone call with the two arguments; if args.size() == 3 either (a)
emit a clear CHECK/LOGGED_ERROR rejecting the 3-arg form (e.g.
"tensor.expand_clone with valid_shape is not supported in lowering") or (b)
propagate the third valid_shape through and construct a tile.expand_clone
variant that accepts it (update type deduction accordingly). Reference
DeduceTensorExpandCloneType to keep type behavior consistent and replace the
RegisterSimple usage with this custom handler.

In `@src/ir/transforms/substitute_tiles_pass.cpp`:
- Around line 49-62: ExtractConstShape currently forces a literal MakeTuple of
ConstInt and crashes on the dynamic tuple form allowed by
DeduceTileExpandCloneType; update the rewrite to accept the runtime tuple shape
instead of hard-failing. In practice, change ExtractConstShape to (1) accept a
MakeTuple whose elements may be ConstInt OR runtime integer scalar expressions
produced by DeduceTileExpandCloneType (e.g., TupleGetItemExpr / index/int scalar
Exprs), push those elements into the returned vector without CHECKing ConstInt,
and only error if shape is not a MakeTuple at all or the tuple is empty; or
alternatively, move a validation into DeduceTileExpandCloneType to reject
non-constant shapes earlier. Reference: function ExtractConstShape and
DeduceTileExpandCloneType / TupleGetItemExpr handling in
src/ir/op/tile_ops/broadcast.cpp.

In `@tests/ut/ir/operators/test_tensor_ops.py`:
- Around line 1479-1489: The test uses an invalid target shape [8, 16, 8] for
expand_clone given the input tensor_var shape [4, 1, 8]; update the target shape
passed to ir.op.tensor.expand_clone to a valid non-broadcast match (e.g., [4,
16, 8]) so non-broadcast axes align with the input, and keep the rest of the
assertions (call, call.op.name, result_type checks) unchanged.

---

Nitpick comments:
In `@src/ir/op/tile_ops/broadcast.cpp`:
- Around line 153-164: The CHECK messages concatenating op_name lack a
separating space, producing messages like "tile.expand_clonerequires..."; update
the three CHECK calls that reference op_name (the one asserting args.size()==2,
the one validating tile_type via As<TileType>, and the one validating
shape_tuple_type via As<TupleType>) to include a space after op_name in the
formatted string (e.g., change '" << op_name << "requires' to '" << op_name << "
requires"') so the operator name and the rest of the message are separated for
readability.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c59c10aa-6e7d-4d77-9130-6aff7d4abc27

📥 Commits

Reviewing files that changed from the base of the PR and between d765fc0 and 97431a6.

📒 Files selected for processing (26)
  • CMakeLists.txt
  • docs/en/dev/passes/00-pass_manager.md
  • docs/zh-cn/dev/passes/00-pass_manager.md
  • include/pypto/ir/transforms/pass_properties.h
  • include/pypto/ir/transforms/passes.h
  • include/pypto/ir/type_inference.h
  • python/bindings/modules/passes.cpp
  • python/pypto/ir/op/tensor_ops.py
  • python/pypto/ir/op/tile_ops.py
  • python/pypto/ir/pass_manager.py
  • python/pypto/language/__init__.py
  • python/pypto/language/op/__init__.py
  • python/pypto/language/op/tensor_ops.py
  • python/pypto/language/op/tile_ops.py
  • python/pypto/language/op/unified_ops.py
  • python/pypto/pypto_core/passes.pyi
  • src/ir/op/tensor_ops/broadcast.cpp
  • src/ir/op/tensor_ops/transform.cpp
  • src/ir/op/tile_ops/broadcast.cpp
  • src/ir/op/tile_ops/transform.cpp
  • src/ir/op/type_inference.cpp
  • src/ir/transforms/op_conversion_registry.cpp
  • src/ir/transforms/substitute_tiles_pass.cpp
  • tests/ut/ir/operators/test_tensor_ops.py
  • tests/ut/ir/operators/test_tile_ops.py
  • tests/ut/ir/transforms/test_substitute_tiles.py

| OutlineIncoreScopes | TypeChecked, SSAForm | SplitIncoreOrch | — |
| OutlineClusterScopes | TypeChecked, SSAForm | ClusterOutlined | — |
| ConvertTensorToTileOps | SplitIncoreOrch | IncoreTileOps | — |
| SubstituteTiles | SSAForm, IncoreTileOps | SSAForm, IncoreTileOps | — |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

SubstituteTiles 属性表与实现不一致。

Line 71 缺少 NormalizedStmtStructure,而代码中的 kSubstituteTilesProperties 把它列为 required/produced。建议同步文档表格字段。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/zh-cn/dev/passes/00-pass_manager.md` at line 71, The table row for
SubstituteTiles is missing NormalizedStmtStructure which causes a mismatch with
the implementation: update the table row for "SubstituteTiles" so its input and
output columns include "NormalizedStmtStructure" to match the
kSubstituteTilesProperties (which lists NormalizedStmtStructure as
required/produced); ensure the document's SSAForm, IncoreTileOps entries reflect
the same required/produced fields as the code.

Comment on lines +605 to +617
def expand_clone(input: Tensor, shape: Sequence[IntLike]) -> Tensor:
"""Clone and expand input to target shape.

Args:
input: Input tensor
shape: Target shape dimensions

Returns:
Tensor wrapping the expand_clone operation
"""
input_expr = input.unwrap()
call_expr = _ir_ops.expand_clone(input_expr, _normalize_intlike(shape))
return Tensor(expr=call_expr)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Expose valid_shape in the public tensor expand_clone API.

At Line 605, the language wrapper only accepts shape, but the IR op supports optional valid_shape. This blocks valid dynamic-shape use cases from the DSL surface.

Proposed fix
-def expand_clone(input: Tensor, shape: Sequence[IntLike]) -> Tensor:
+def expand_clone(
+    input: Tensor,
+    shape: Sequence[IntLike],
+    *,
+    valid_shape: Sequence[IntLike] | None = None,
+) -> Tensor:
@@
-    call_expr = _ir_ops.expand_clone(input_expr, _normalize_intlike(shape))
+    normalized_valid_shape = None if valid_shape is None else _normalize_intlike(valid_shape)
+    call_expr = _ir_ops.expand_clone(input_expr, _normalize_intlike(shape), normalized_valid_shape)
     return Tensor(expr=call_expr)
🧰 Tools
🪛 Ruff (0.15.7)

[error] 605-605: Function argument input is shadowing a Python builtin

(A002)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/pypto/language/op/tensor_ops.py` around lines 605 - 617, The public
function expand_clone currently only accepts shape but the IR op
_ir_ops.expand_clone also supports an optional valid_shape; update the
expand_clone signature to accept valid_shape: Optional[Sequence[IntLike]] =
None, normalize it (e.g. via _normalize_intlike) and pass it to
_ir_ops.expand_clone alongside the normalized shape, and update the
docstring/type hints accordingly; keep existing use of input.unwrap() and return
Tensor(expr=call_expr).

Comment on lines +298 to +304
def expand_clone(input: T, shape: Sequence[IntLike]) -> T:
"""Clone and expand input to target shape, dispatched by input type."""
if isinstance(input, Tensor):
return _tensor.expand_clone(input, shape)
if isinstance(input, Tile):
return _tile.expand_clone(input, shape)
raise TypeError(f"expand_clone: expected Tensor or Tile, got {type(input).__name__}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Unified expand_clone should support tensor valid_shape.

At Line 298-304, pl.expand_clone(...) only accepts shape, so callers using unified ops still cannot pass tensor valid_shape even if tensor-level support is added.

Proposed fix
-def expand_clone(input: T, shape: Sequence[IntLike]) -> T:
+def expand_clone(
+    input: T,
+    shape: Sequence[IntLike],
+    valid_shape: Sequence[IntLike] | None = None,
+) -> T:
@@
-    if isinstance(input, Tensor):
-        return _tensor.expand_clone(input, shape)
+    if isinstance(input, Tensor):
+        return _tensor.expand_clone(input, shape, valid_shape=valid_shape)
     if isinstance(input, Tile):
+        if valid_shape is not None:
+            raise TypeError("expand_clone: valid_shape is only supported for Tensor inputs")
         return _tile.expand_clone(input, shape)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def expand_clone(input: T, shape: Sequence[IntLike]) -> T:
"""Clone and expand input to target shape, dispatched by input type."""
if isinstance(input, Tensor):
return _tensor.expand_clone(input, shape)
if isinstance(input, Tile):
return _tile.expand_clone(input, shape)
raise TypeError(f"expand_clone: expected Tensor or Tile, got {type(input).__name__}")
def expand_clone(
input: T,
shape: Sequence[IntLike],
valid_shape: Sequence[IntLike] | None = None,
) -> T:
"""Clone and expand input to target shape, dispatched by input type."""
if isinstance(input, Tensor):
return _tensor.expand_clone(input, shape, valid_shape=valid_shape)
if isinstance(input, Tile):
if valid_shape is not None:
raise TypeError("expand_clone: valid_shape is only supported for Tensor inputs")
return _tile.expand_clone(input, shape)
raise TypeError(f"expand_clone: expected Tensor or Tile, got {type(input).__name__}")
🧰 Tools
🪛 Ruff (0.15.7)

[error] 298-298: Function argument input is shadowing a Python builtin

(A002)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/pypto/language/op/unified_ops.py` around lines 298 - 304, The unified
expand_clone currently only accepts shape so callers cannot pass a tensor-level
valid_shape; update the expand_clone signature to accept an optional valid_shape
(e.g. valid_shape: Optional[Sequence[IntLike]] = None) and pass it through to
the tensor implementation (_tensor.expand_clone(input, shape, valid_shape));
keep calling _tile.expand_clone(input, shape) for Tile or forward valid_shape if
Tile supports it. Also update the Type hints and the error message path in
expand_clone to reflect the new parameter so callers can supply valid_shape when
operating on Tensors.

@wuzhf9 wuzhf9 force-pushed the issue679 branch 2 times, most recently from 0ebbe63 to 8550a5c Compare April 2, 2026 10:27
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
src/ir/op/type_inference.cpp (1)

286-296: Consider overflow protection for extremely large shapes.

The multiplication product *= const_dim->value_ has no overflow check. While unlikely for typical tensor shapes, this could silently wrap for pathological inputs. The usage in DeduceTileReshapeType checks product > 0, which won't detect overflow that wraps to a positive value.

If this is a concern, consider adding overflow detection:

🔧 Optional: Add overflow detection
 int64_t ComputeShapeProduct(const std::vector<ExprPtr>& shape) {
   int64_t product = 1;
   for (const auto& dim : shape) {
     auto const_dim = As<ConstInt>(dim);
     if (!const_dim) {
       return -1;  // Dynamic shape, cannot compute product
     }
+    if (const_dim->value_ <= 0) {
+      return -1;  // Invalid dimension
+    }
+    // Check for overflow before multiplication
+    if (product > std::numeric_limits<int64_t>::max() / const_dim->value_) {
+      return -1;  // Would overflow
+    }
     product *= const_dim->value_;
   }
   return product;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/op/type_inference.cpp` around lines 286 - 296, ComputeShapeProduct may
silently overflow when multiplying many or huge dimensions; modify
ComputeShapeProduct to detect overflow before doing product *= const_dim->value_
(use INT64_MAX / const_dim->value_ check or an equivalent safe multiply) and
return -1 if overflow is detected (treat overflow as dynamic/unknown) so callers
like DeduceTileReshapeType that test product > 0 won't be misled; update the
function that iterates over shape (ComputeShapeProduct and the use of
ConstInt::value_) to perform this pre-multiply check and include <limits> as
needed.
src/ir/transforms/substitute_tiles_pass.cpp (1)

106-107: Consider visiting shape_expr for consistency with tile_src.

tile_src is visited via VisitExpr, but shape_expr is used directly. While ExtractConstShape currently requires all-constant elements (making remapping moot), this could become a subtle bug if ExtractConstShape is relaxed in the future to support symbolic dimensions. Visiting both arguments would make the code more resilient to future changes.

     auto tile_src = VisitExpr(call->args_[0]);
-    auto shape_expr = call->args_[1];
+    auto shape_expr = VisitExpr(call->args_[1]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/transforms/substitute_tiles_pass.cpp` around lines 106 - 107, The code
visits the first call argument (tile_src) with VisitExpr but uses call->args_[1]
(shape_expr) directly; update the SubstituteTilesPass (in
substitute_tiles_pass.cpp) to visit shape_expr via VisitExpr as well (e.g.,
replace direct use of call->args_[1] with VisitExpr(call->args_[1])) so both
operands are normalized before calling ExtractConstShape; this keeps behavior
consistent with tile_src and prevents future bugs if ExtractConstShape is
extended to handle symbolic dimensions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ir/op/tensor_ops/broadcast.cpp`:
- Around line 189-190: The CHECK error message concatenates op_name directly
with "requires" causing a missing space; update the CHECK in broadcast.cpp that
references shape_tuple_type and op_name (and uses
args[1]->GetType()->TypeName()) to include a space between op_name and
"requires" (e.g., change the string to " requires shape to be TupleType, but got
") so the logged message is properly spaced.
- Around line 195-199: The error messages in broadcast.cpp concatenate op_name
and the rest of the message (e.g., op_name << "shape tuple element..."),
producing missing spaces; update the two CHECK log messages that reference
scalar_type, shape_tuple_type->types_[i]->TypeName(), and
scalar_type->dtype_.ToString() so they insert a space after op_name (e.g.,
op_name << " shape tuple element " ...) and ensure consistent spacing before the
rest of the text to produce readable messages.
- Around line 179-180: The CHECK message in src/ir/op/tensor_ops/broadcast.cpp
concatenates op_name with the following string, producing no space (e.g.,
"tensor.expand_clonerequires..."); update the CHECK call that uses op_name to
include a separating space in the error string (e.g., add a trailing space in
the literal or insert " << ' ' << ") so the output becomes "op_name requires 2
or 3 arguments..." while leaving the existing CHECK(args.size() == 2 ||
args.size() == 3) and variable names unchanged.
- Line 247: The CHECK call that emits the message concatenates op_name and the
literal without a separating space: in the CHECK(valid_shape_tuple) << op_name
<< "valid_shape (3rd argument) must be a MakeTuple"; expression, insert a space
or separator (e.g., << " " << or << ": " <<) between op_name and the message so
the logged output reads correctly; update the CHECK usage in broadcast.cpp (the
CHECK(valid_shape_tuple) line referencing op_name) accordingly.
- Around line 184-185: The error message built in the CHECK(tensor_type)
statement concatenates op_name directly to the string "requires..." with no
space; update the CHECK(tensor_type) << op_name << "requires first argument..."
expression to include a space between op_name and the rest of the message (e.g.,
insert a " " between op_name and the literal or prepend the literal with a
leading space) so the emitted message reads "<op_name> requires first argument
to be a TensorType, but got ...".

---

Nitpick comments:
In `@src/ir/op/type_inference.cpp`:
- Around line 286-296: ComputeShapeProduct may silently overflow when
multiplying many or huge dimensions; modify ComputeShapeProduct to detect
overflow before doing product *= const_dim->value_ (use INT64_MAX /
const_dim->value_ check or an equivalent safe multiply) and return -1 if
overflow is detected (treat overflow as dynamic/unknown) so callers like
DeduceTileReshapeType that test product > 0 won't be misled; update the function
that iterates over shape (ComputeShapeProduct and the use of ConstInt::value_)
to perform this pre-multiply check and include <limits> as needed.

In `@src/ir/transforms/substitute_tiles_pass.cpp`:
- Around line 106-107: The code visits the first call argument (tile_src) with
VisitExpr but uses call->args_[1] (shape_expr) directly; update the
SubstituteTilesPass (in substitute_tiles_pass.cpp) to visit shape_expr via
VisitExpr as well (e.g., replace direct use of call->args_[1] with
VisitExpr(call->args_[1])) so both operands are normalized before calling
ExtractConstShape; this keeps behavior consistent with tile_src and prevents
future bugs if ExtractConstShape is extended to handle symbolic dimensions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5feb745e-9c27-4b20-8706-9eab6aa56c4e

📥 Commits

Reviewing files that changed from the base of the PR and between 97431a6 and 8550a5c.

📒 Files selected for processing (26)
  • CMakeLists.txt
  • docs/en/dev/passes/00-pass_manager.md
  • docs/zh-cn/dev/passes/00-pass_manager.md
  • include/pypto/ir/transforms/pass_properties.h
  • include/pypto/ir/transforms/passes.h
  • include/pypto/ir/type_inference.h
  • python/bindings/modules/passes.cpp
  • python/pypto/ir/op/tensor_ops.py
  • python/pypto/ir/op/tile_ops.py
  • python/pypto/ir/pass_manager.py
  • python/pypto/language/__init__.py
  • python/pypto/language/op/__init__.py
  • python/pypto/language/op/tensor_ops.py
  • python/pypto/language/op/tile_ops.py
  • python/pypto/language/op/unified_ops.py
  • python/pypto/pypto_core/passes.pyi
  • src/ir/op/tensor_ops/broadcast.cpp
  • src/ir/op/tensor_ops/transform.cpp
  • src/ir/op/tile_ops/broadcast.cpp
  • src/ir/op/tile_ops/transform.cpp
  • src/ir/op/type_inference.cpp
  • src/ir/transforms/op_conversion_registry.cpp
  • src/ir/transforms/substitute_tiles_pass.cpp
  • tests/ut/ir/operators/test_tensor_ops.py
  • tests/ut/ir/operators/test_tile_ops.py
  • tests/ut/ir/transforms/test_substitute_tiles.py
✅ Files skipped from review due to trivial changes (5)
  • src/ir/transforms/op_conversion_registry.cpp
  • include/pypto/ir/transforms/pass_properties.h
  • tests/ut/ir/operators/test_tensor_ops.py
  • include/pypto/ir/type_inference.h
  • tests/ut/ir/operators/test_tile_ops.py
🚧 Files skipped from review as they are similar to previous changes (10)
  • docs/en/dev/passes/00-pass_manager.md
  • python/bindings/modules/passes.cpp
  • CMakeLists.txt
  • include/pypto/ir/transforms/passes.h
  • python/pypto/pypto_core/passes.pyi
  • python/pypto/language/op/tile_ops.py
  • python/pypto/ir/op/tensor_ops.py
  • src/ir/op/tensor_ops/transform.cpp
  • src/ir/op/tile_ops/broadcast.cpp
  • docs/zh-cn/dev/passes/00-pass_manager.md

Comment on lines +184 to +185
CHECK(tensor_type) << op_name << "requires first argument to be a TensorType, but got "
<< args[0]->GetType()->TypeName();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing space in error message.

Same issue as above - op_name should be followed by a space.

Proposed fix
-  CHECK(tensor_type) << op_name << "requires first argument to be a TensorType, but got "
+  CHECK(tensor_type) << op_name << " requires first argument to be a TensorType, but got "
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CHECK(tensor_type) << op_name << "requires first argument to be a TensorType, but got "
<< args[0]->GetType()->TypeName();
CHECK(tensor_type) << op_name << " requires first argument to be a TensorType, but got "
<< args[0]->GetType()->TypeName();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/op/tensor_ops/broadcast.cpp` around lines 184 - 185, The error message
built in the CHECK(tensor_type) statement concatenates op_name directly to the
string "requires..." with no space; update the CHECK(tensor_type) << op_name <<
"requires first argument..." expression to include a space between op_name and
the rest of the message (e.g., insert a " " between op_name and the literal or
prepend the literal with a leading space) so the emitted message reads
"<op_name> requires first argument to be a TensorType, but got ...".

Comment on lines +189 to +190
CHECK(shape_tuple_type) << op_name << "requires shape to be TupleType, but got "
<< args[1]->GetType()->TypeName();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing space in error message.

Proposed fix
-  CHECK(shape_tuple_type) << op_name << "requires shape to be TupleType, but got "
+  CHECK(shape_tuple_type) << op_name << " requires shape to be TupleType, but got "
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CHECK(shape_tuple_type) << op_name << "requires shape to be TupleType, but got "
<< args[1]->GetType()->TypeName();
CHECK(shape_tuple_type) << op_name << " requires shape to be TupleType, but got "
<< args[1]->GetType()->TypeName();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/op/tensor_ops/broadcast.cpp` around lines 189 - 190, The CHECK error
message concatenates op_name directly with "requires" causing a missing space;
update the CHECK in broadcast.cpp that references shape_tuple_type and op_name
(and uses args[1]->GetType()->TypeName()) to include a space between op_name and
"requires" (e.g., change the string to " requires shape to be TupleType, but got
") so the logged message is properly spaced.

Comment on lines +195 to +199
CHECK(scalar_type) << op_name << "shape tuple element " << i << " must be ScalarType, but got "
<< shape_tuple_type->types_[i]->TypeName();
CHECK(scalar_type->dtype_.IsInt())
<< op_name << "shape tuple element " << i << " must have integer dtype, but got "
<< scalar_type->dtype_.ToString();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing spaces in error messages for shape tuple element validation.

Proposed fix
-    CHECK(scalar_type) << op_name << "shape tuple element " << i << " must be ScalarType, but got "
+    CHECK(scalar_type) << op_name << " shape tuple element " << i << " must be ScalarType, but got "
                        << shape_tuple_type->types_[i]->TypeName();
-    CHECK(scalar_type->dtype_.IsInt())
-        << op_name << "shape tuple element " << i << " must have integer dtype, but got "
+    CHECK(scalar_type->dtype_.IsInt())
+        << op_name << " shape tuple element " << i << " must have integer dtype, but got "
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/op/tensor_ops/broadcast.cpp` around lines 195 - 199, The error
messages in broadcast.cpp concatenate op_name and the rest of the message (e.g.,
op_name << "shape tuple element..."), producing missing spaces; update the two
CHECK log messages that reference scalar_type,
shape_tuple_type->types_[i]->TypeName(), and scalar_type->dtype_.ToString() so
they insert a space after op_name (e.g., op_name << " shape tuple element " ...)
and ensure consistent spacing before the rest of the text to produce readable
messages.

// If valid_shape is provided as 3rd argument, store it in TensorView
if (args.size() == 3) {
auto valid_shape_tuple = As<MakeTuple>(args[2]);
CHECK(valid_shape_tuple) << op_name << "valid_shape (3rd argument) must be a MakeTuple";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing space in error message.

Proposed fix
-    CHECK(valid_shape_tuple) << op_name << "valid_shape (3rd argument) must be a MakeTuple";
+    CHECK(valid_shape_tuple) << op_name << " valid_shape (3rd argument) must be a MakeTuple";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CHECK(valid_shape_tuple) << op_name << "valid_shape (3rd argument) must be a MakeTuple";
CHECK(valid_shape_tuple) << op_name << " valid_shape (3rd argument) must be a MakeTuple";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/op/tensor_ops/broadcast.cpp` at line 247, The CHECK call that emits
the message concatenates op_name and the literal without a separating space: in
the CHECK(valid_shape_tuple) << op_name << "valid_shape (3rd argument) must be a
MakeTuple"; expression, insert a space or separator (e.g., << " " << or << ": "
<<) between op_name and the message so the logged output reads correctly; update
the CHECK usage in broadcast.cpp (the CHECK(valid_shape_tuple) line referencing
op_name) accordingly.

@wuzhf9 wuzhf9 force-pushed the issue679 branch 3 times, most recently from 19f8526 to ed4a806 Compare April 9, 2026 07:41
@wuzhf9 wuzhf9 changed the title feat(ir): Added expand_clone ops and substitute_tiles pass feat(ir): Added tensor.expand_clone ops Apr 9, 2026
 # Summary
- Add tensor.expand_clone
- Implement per-dimension broadcast behavior:
  - **dim0**: load once, loop over dst.size(0), store at [i, 0, 0]
  - **dim1**: per-row load [1,1,n], create [1,k,n], col_expand, store at [i, 0, 0]
  - **dim2**: load with valid_shape=[m,k,1], row_expand, store once at [0,0,0]
  - **no broadcast**: direct tile.store of loaded input into target
- Ensure expand_clone is treated as self-loading in ConvertTensorToTileOps.
- Update unit/runtime tests and expected IR to reflect the store-based semantics.

 # Behavior Notes
- Expand clone allows at most one broadcast dimension; all other dims must match.
- Broadcasting uses **tensor-level expand_clone** (no tile API), writing results into the provided target tensor.

 # Tests
- Update IR conversion tests for expand_clone (dim0/1/2).
- Update unit tests for expand_clone (dim0/1/2).
- Update runtime expand_clone tests to cover dim0/1/2 via tensor.expand_clone.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[New Op] expand_clone

1 participant