refactor(ir): extract OptimizeOrchTensors pass and simplify ConvertTensorToTileOps by Hzfengsy · Pull Request #969 · hw-native-sys/pypto

Hzfengsy · 2026-04-11T06:45:42Z

Summary

Addresses #962 (Approach A — post-pass) by extracting buffer optimization patterns from ConvertTensorToTileOps into a standalone OptimizeOrchTensors pass.

Extract OptimizeOrchTensors as a new pass with 3 pattern classes:
- IterArgReuseOptimizer (Pattern 1): merges Out→InOut for loop-carried buffers
- AssembleParentStridesOptimizer (Pattern 2): attaches parent-tensor strides via TensorView
- AssembleLoopRewriter (Pattern 3): rewrites tile.assemble loops to tile.store loops
Simplify ConvertTensorToTileOps: remove alias analysis (~390 lines), remove iter-arg mapping, remove IfStmt store sinking — now purely mechanical tensor→tile conversion
Restore MatmulSlice handling: was accidentally removed — produces tile.load(Mat, transpose=...) for slice→matmul patterns
Replace local VarUseVisitor with shared var_collectors::VarDefUseCollector
Remove dead Pattern 3 (loop hoisting) code from the old implementation
Add documentation for both passes (en + zh-cn) and renumber existing pass docs to match pipeline order
Add IncoreTileOps to OptimizeOrchTensors pass properties (required + produced) to enforce correct ordering

Cross-layer changes

Layer	Files
C++ implementation	`convert_tensor_to_tile_ops_pass.cpp`, `optimize_orch_tensors_pass.cpp` (new)
C++ headers	`passes.h`, `pass_properties.h`
Build system	`CMakeLists.txt`
Python bindings	`passes.cpp`
Type stubs	`passes.pyi`
Pass manager	`pass_manager.py`
Tests	`test_convert_tensor_to_tile_ops.py`, `test_optimize_orch_tensors.py` (new), `test_pass_manager.py`
Documentation	`09-convert_tensor_to_tile_ops.md` (new), `10-optimize_orch_tensors.md` (new), doc renumbering

Testing

All 71 transform tests pass (51 ConvertTensorToTileOps + 8 OptimizeOrchTensors + 12 PassManager)
Full transform suite: 959 passed, 12 skipped
Clang-tidy clean (all findings fixed)
All pre-commit hooks pass (clang-format, cpplint, ruff, pyright, markdownlint)

Related Issues

Addresses #962

…OrchTensors - Extract optimize_orch_tensors_pass.cpp as standalone pass with 3 pattern classes: IterArgReuseOptimizer, AssembleParentStridesOptimizer, AssembleLoopRewriter (previously interleaved free functions) - Remove dead Pattern 3 (loop hoisting) code from OptimizeOrchTensors - Restore MatmulSlice handling in ConvertTensorToTileOps (was accidentally removed — produces tile.load(Mat, transpose=...) for slice→matmul patterns) - Remove alias analysis (~390 lines) from ConvertTensorToTileOps as it violates single-responsibility; direction inference handled by OptimizeOrchTensors Pattern 1 - Replace local VarUseVisitor with common var_collectors::VarDefUseCollector - Add documentation for both passes (en + zh-cn) and renumber pass docs to match pipeline execution order

- Add IncoreTileOps to required/produced in kOptimizeOrchTensorsProperties to enforce correct pass ordering (must run after ConvertTensorToTileOps) - Renumber Pattern 4 → Pattern 3 in docs (old Pattern 3 was removed) - Fix type stub docstring to mention both orchestration and InCore functions

coderabbitai · 2026-04-11T06:45:57Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR extracts orchestration-level buffer/layout optimizations from ConvertTensorToTileOps into a new OptimizeOrchTensors pass, adds its implementation, docs, Python bindings, tests, and updates pass ordering and registration; ConvertTensorToTileOps is simplified accordingly.

Changes

Cohort / File(s)	Summary
Pass Documentation Ordering `.claude/rules/pass-doc-ordering.md`	Updated pass ordering: added `09-convert_tensor_to_tile_ops.md` and `10-optimize_orch_tensors.md`, renumbered subsequent entries and moved non-default strategy docs to 90/91.
Build System `CMakeLists.txt`	Added `src/ir/transforms/optimize_orch_tensors_pass.cpp` to compiled sources.
English Docs `docs/en/dev/passes/09-convert_tensor_to_tile_ops.md`, `docs/en/dev/passes/10-optimize_orch_tensors.md`	Added ConvertTensorToTileOps doc and new OptimizeOrchTensors doc describing pass APIs, preconditions, sequencing, and three optimization patterns.
Chinese Docs `docs/zh-cn/dev/passes/09-convert_tensor_to_tile_ops.md`, `docs/zh-cn/dev/passes/10-optimize_orch_tensors.md`	Added Chinese equivalents for both pass docs.
Pass Properties `include/pypto/ir/transforms/pass_properties.h`	Added `kOptimizeOrchTensorsProperties` with required/produced `{SplitIncoreOrch, IncoreTileOps}`.
Pass Declarations `include/pypto/ir/transforms/passes.h`	Declared new exported pass `pass::OptimizeOrchTensors()` and its sequencing after ConvertTensorToTileOps.
Python Bindings & Stubs `python/bindings/modules/passes.cpp`, `python/pypto/pypto_core/passes.pyi`	Added `passes.optimize_orch_tensors` factory binding and updated `.pyi` exports (including `optimize_orch_tensors`).
Pass Manager Integration `python/pypto/ir/pass_manager.py`, `tests/ut/ir/transforms/test_pass_manager.py`	Registered `OptimizeOrchTensors` in `tensor_only_passes`, making it part of the default optimization strategy; tests updated to expect it.
ConvertTensorToTileOps Implementation `src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp`	Removed in-pass orchestration analyses (`AnalyzeIterArgMappings`, `AnalyzeAssembleParentShapes`) and related helpers; simplified `TransformIncoreFunction` and variable-use detection; left matmul-slice collection local.
New Pass Implementation `src/ir/transforms/optimize_orch_tensors_pass.cpp`	Added new program pass implementing three ordered optimizers: IterArgReuseOptimizer, AssembleParentStridesOptimizer, AssembleLoopRewriter (iter-arg merge to InOut, parent-stride propagation via TensorView, assemble-loop → tile.store rewrite).
Tests — ConvertTensorToTileOps `tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py`	Adjusted/removed tests that previously asserted iter-arg/store-sinking and assemble-parent-stride effects; updated expected IR to reflect simpler ConvertTensorToTileOps outputs.
Tests — New Pass `tests/ut/ir/transforms/test_optimize_orch_tensors.py`	Added comprehensive tests covering iter-arg reuse, assemble parent-stride enrichment, assemble-loop rewrite, loop hoisting, and edge cases.

Sequence Diagram(s)

sequenceDiagram
    participant PM as Program
    participant PassMgr as PassManager
    participant OptPass as OptimizeOrchTensors
    participant ConvPass as ConvertTensorToTileOps
    participant IC as InCoreFunction
    participant Orch as Orchestration

    rect rgba(200,230,255,0.5)
    PM->>PassMgr: request optimization (Default)
    PassMgr->>ConvPass: run ConvertTensorToTileOps
    ConvPass->>IC: convert tensor.* → tile.* (matmul-slice prescan)
    end

    rect rgba(200,255,200,0.5)
    PassMgr->>OptPass: run OptimizeOrchTensors
    OptPass->>Orch: analyze tensor.create / assemble patterns in orchestration
    OptPass->>IC: rewrite InCore signatures (merge Out→InOut, update TensorView strides)
    OptPass->>Orch: update callsites (remove tensor.create, adjust calls)
    OptPass->>IC: rewrite assemble-loops → tile.store
    end

    PM->>PM: resulting optimized IR

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~85 minutes

Possibly related issues

[Code Health] Refactor ConvertTensorToTileOps: eliminate duplication, split oversized file, and unify matmul-slice path #890: Implements splitting ConvertTensorToTileOps responsibilities into a new OptimizeOrchTensors pass as proposed in the issue.
[RFC] Separate buffer optimizations from ConvertTensorToTileOps #962: Directly implements the RFC to move iter-arg/assemble analyses and buffer optimizations into a separate pass.

Possibly related PRs

fix(pass): store iter-arg returns to InOut params in ConvertTensorToTileOps #783: Adds iter-arg-aware AnalyzeIterArgMappings/store-sinking to ConvertTensorToTileOps — overlaps/conflicts with this PR which extracts that logic into OptimizeOrchTensors.
refactor(pass): replace hand-rolled recursion with IRMutator in ConvertTensorToTileOps #834: Refactors TransformIncore logic and matmul-slice handling in ConvertTensorToTileOps; closely related to the refactors present here.
fix(pass): sink iter-arg stores into IfStmt branches in ConvertTensorToTileOps #833: Implements IfStmt sinking/store-sinking in ConvertTensorToTileOps — functionally related to the store-sinking behavior moved into the new pass.

Suggested reviewers

lyfne123

🐰 Hop! A pass split—so neat!
Iter-args find their match,
Strides line up in place,
Loops rewrite with grace,
Orchestration hums apace! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.96% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main refactoring: extracting OptimizeOrchTensors as a new pass and simplifying ConvertTensorToTileOps, which are the core changes in this PR.
Description check	✅ Passed	The description is well-structured and directly related to the changeset, clearly explaining the motivation (issue `#962`), the three pattern classes extracted, simplifications made, and testing results.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces the OptimizeOrchTensors pass, which optimizes tensor buffer usage between orchestration and InCore functions through three patterns: iter-arg reuse, assemble parent stride propagation, and assemble-loop rewriting. It also refactors ConvertTensorToTileOps by moving cross-function analyses to this new pass. The feedback focuses on improving the robustness of IR traversals in the new pass, specifically recommending the use of shared utilities like VarDefUseCollector to ensure nested control flow structures are correctly handled during variable use analysis and loop optimization.

src/ir/transforms/optimize_orch_tensors_pass.cpp

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

tests/ut/ir/transforms/test_optimize_orch_tensors.py (1)
21-236: Add a WhileStmt regression for Pattern 1.

The new pass docs and analyzer cover both ForStmt and WhileStmt, but this suite only exercises pl.range loops. A small while fixture would guard the VisitStmt_(const WhileStmtPtr&) path from drifting unnoticed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ut/ir/transforms/test_optimize_orch_tensors.py` around lines 21 - 236,
Add a new unit test method in the TestIterArgReuse class (e.g.,
test_while_iter_arg_reuse) that mirrors one of the existing ForStmt cases but
uses a WhileStmt-style loop in the main function to exercise the
VisitStmt_(const WhileStmtPtr&) path; keep the same main_incore_0 signature and
expected transformation (Out param merged into InOut), run After =
passes.optimize_orch_tensors()(Before) and call
ir.assert_structural_equal(After, Expected) so the optimize_orch_tensors pass
and the main/main_incore_0 pair are validated for a while-loop iter-arg
scenario.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/en/dev/passes/10-optimize_orch_tensors.md`:
- Around line 125-131: Update the pass contract table to include IncoreTileOps:
add IncoreTileOps to the Required column alongside SplitIncoreOrch and to the
Produced (or indicate as preserved) column so the doc reflects that the pass
requires and preserves IncoreTileOps to enforce ordering after
ConvertTensorToTileOps; reference the IncoreTileOps symbol and
ConvertTensorToTileOps in the brief note so readers understand why the contract
changed.

In `@docs/zh-cn/dev/passes/09-convert_tensor_to_tile_ops.md`:
- Around line 48-49: Update the docs to remove the claim that the assemble-loop
rewrite is performed in ConvertTensorToTileOps and instead point readers to the
later pass that performs it: change the sentence referencing "如果返回值来自
`tile.assemble` 循环，则将循环重写为直接使用 `tile.store`（assemble-loop 重写）" so it no longer
attributes the rewrite to `ConvertTensorToTileOps`, and add a brief note
directing readers to the `OptimizeOrchTensors` pass (and the term "assemble-loop
重写") as the component that actually performs this loop rewrite.

In `@docs/zh-cn/dev/passes/10-optimize_orch_tensors.md`:
- Around line 125-131: The Pass 属性表 is missing IncoreTileOps: update the table
so both the Required and Produced entries list the full set {SplitIncoreOrch,
IncoreTileOps} (i.e., replace the single "SplitIncoreOrch" value with the pair
that matches the implementation), ensuring the Pass attributes for the pass
named/represented by SplitIncoreOrch and IncoreTileOps in this document reflect
the actual implementation.

In `@src/ir/transforms/optimize_orch_tensors_pass.cpp`:
- Around line 1076-1124: The loop rewrite assumes op->iter_args_[0]->initValue_
is a disposable tensor created by tensor.create but doesn't verify it; update
VisitStmt_(const ForStmtPtr& op) to check that op->iter_args_[0]->initValue_ is
an Assign/Var whose defining statement is a tensor.create before inserting into
dead_create_vars_. Specifically, locate the use of iter_args_[0]->initValue_
(and the code that sets init_var and dead_create_vars_) and: 1) resolve
initValue_ to its defining AssignStmt/Call and confirm the call->op_->name_ ==
"tensor.create"; 2) optionally ensure the created Var has no other uses (reuse
StmtUsesVar or similar) before marking it dead; only then insert init_var.get()
into dead_create_vars_, otherwise skip marking it.
- Around line 326-329: The current code stores reuse_result into results_ keyed
only by fname causing the first matching caller to overwrite the callee
signature for all call sites; change this by either (A) verifying that every
call site of the callee yields the identical reuse_result.merges set before
writing to results_ (use the module's call graph / iterate all callers and
compare merge sets) and only then store results_[fname] = reuse_result and call
RewriteIncore()/CallSiteRewriter(), or (B) avoid global mutation and keep the
rewrite local: do not insert into results_ for fname, instead record
per-callsite rewrites (keyed by the caller or CallSite id) and only modify that
caller/callsite; update the logic around results_, reuse_result,
RewriteIncore(), and CallSiteRewriter() accordingly (same change also for the
analogous block at lines ~460-479).

---

Nitpick comments:
In `@tests/ut/ir/transforms/test_optimize_orch_tensors.py`:
- Around line 21-236: Add a new unit test method in the TestIterArgReuse class
(e.g., test_while_iter_arg_reuse) that mirrors one of the existing ForStmt cases
but uses a WhileStmt-style loop in the main function to exercise the
VisitStmt_(const WhileStmtPtr&) path; keep the same main_incore_0 signature and
expected transformation (Out param merged into InOut), run After =
passes.optimize_orch_tensors()(Before) and call
ir.assert_structural_equal(After, Expected) so the optimize_orch_tensors pass
and the main/main_incore_0 pair are validated for a while-loop iter-arg
scenario.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0a159f65-9ebf-4c15-b0a6-b8fce2ed2b04

📥 Commits

Reviewing files that changed from the base of the PR and between b2104ad and e526c60.

📒 Files selected for processing (30)

.claude/rules/pass-doc-ordering.md
CMakeLists.txt
docs/en/dev/passes/09-convert_tensor_to_tile_ops.md
docs/en/dev/passes/10-optimize_orch_tensors.md
docs/en/dev/passes/11-flatten_tile_nd_to_2d.md
docs/en/dev/passes/14-expand_mixed_kernel.md
docs/en/dev/passes/15-init_memref.md
docs/en/dev/passes/16-memory_reuse.md
docs/en/dev/passes/17-allocate_memory_addr.md
docs/en/dev/passes/90-insert_sync.md
docs/en/dev/passes/91-utility_passes.md
docs/zh-cn/dev/passes/09-convert_tensor_to_tile_ops.md
docs/zh-cn/dev/passes/10-optimize_orch_tensors.md
docs/zh-cn/dev/passes/11-flatten_tile_nd_to_2d.md
docs/zh-cn/dev/passes/14-expand_mixed_kernel.md
docs/zh-cn/dev/passes/15-init_memref.md
docs/zh-cn/dev/passes/16-memory_reuse.md
docs/zh-cn/dev/passes/17-allocate_memory_addr.md
docs/zh-cn/dev/passes/90-insert_sync.md
docs/zh-cn/dev/passes/91-utility_passes.md
include/pypto/ir/transforms/pass_properties.h
include/pypto/ir/transforms/passes.h
python/bindings/modules/passes.cpp
python/pypto/ir/pass_manager.py
python/pypto/pypto_core/passes.pyi
src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp
src/ir/transforms/optimize_orch_tensors_pass.cpp
tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py
tests/ut/ir/transforms/test_optimize_orch_tensors.py
tests/ut/ir/transforms/test_pass_manager.py

docs/en/dev/passes/10-optimize_orch_tensors.md

docs/zh-cn/dev/passes/09-convert_tensor_to_tile_ops.md

docs/zh-cn/dev/passes/10-optimize_orch_tensors.md

src/ir/transforms/optimize_orch_tensors_pass.cpp

Copilot

Pull request overview

This PR refactors the IR transform pipeline by extracting cross-function buffer/tensor optimizations out of ConvertTensorToTileOps into a new standalone OptimizeOrchTensors pass, leaving ConvertTensorToTileOps as a largely mechanical tensor→tile lowering and call-site update pass. It updates C++/Python pass registration, pass-manager ordering, adds focused unit tests for the new pass, and renumbers/extends pass documentation to match the updated pipeline.

Changes:

Introduce OptimizeOrchTensors (new pass) implementing iter-arg reuse, assemble parent-stride propagation, and assemble-loop rewrite.
Simplify ConvertTensorToTileOps by removing iter-arg/assemble analyses and parameter-direction inference, while keeping conversion-time MatmulSlice handling.
Update bindings, pass manager, tests, and English/Chinese pass docs (including doc ordering renumbering).

Reviewed changes

Copilot reviewed 16 out of 30 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp`	Removes bundled buffer optimizations/analyses; keeps core tensor→tile conversion and assemble-loop conversion behavior.
`src/ir/transforms/optimize_orch_tensors_pass.cpp`	New pass implementing the extracted orchestration/InCore tensor buffer optimizations (Patterns 1–3).
`include/pypto/ir/transforms/passes.h`	Declares the new `OptimizeOrchTensors` pass and documents intended behavior/order.
`include/pypto/ir/transforms/pass_properties.h`	Adds pass properties for `OptimizeOrchTensors` to enforce correct pipeline ordering.
`python/bindings/modules/passes.cpp`	Exposes `optimize_orch_tensors` in Python bindings with a pass description.
`python/pypto/ir/pass_manager.py`	Registers `OptimizeOrchTensors` in the default pipeline ordering after `ConvertTensorToTileOps`.
`python/pypto/pypto_core/passes.pyi`	Adds stub for `optimize_orch_tensors` and exports it via `__all__`.
`CMakeLists.txt`	Adds the new C++ pass source to the build.
`tests/ut/ir/transforms/test_pass_manager.py`	Updates expected pass order to include `OptimizeOrchTensors`.
`tests/ut/ir/transforms/test_optimize_orch_tensors.py`	New unit tests covering the three OptimizeOrchTensors patterns and edge cases.
`tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py`	Updates/realigns tests to validate “naive” ConvertTensorToTileOps behavior post-extraction.
`docs/en/dev/passes/09-convert_tensor_to_tile_ops.md`	New/updated documentation describing ConvertTensorToTileOps’ simplified responsibilities.
`docs/en/dev/passes/10-optimize_orch_tensors.md`	New documentation for OptimizeOrchTensors patterns and intended placement.
`docs/en/dev/passes/11-flatten_tile_nd_to_2d.md`	Renumbered/added doc to match pipeline ordering.
`docs/en/dev/passes/14-expand_mixed_kernel.md`	Renumbered doc to match pipeline ordering.
`docs/en/dev/passes/15-init_memref.md`	Renumbered doc to match pipeline ordering.
`docs/en/dev/passes/16-memory_reuse.md`	Renumbered doc to match pipeline ordering.
`docs/en/dev/passes/17-allocate_memory_addr.md`	Renumbered doc to match pipeline ordering.
`docs/en/dev/passes/90-insert_sync.md`	Added/renumbered documentation for InsertSync.
`docs/en/dev/passes/91-utility_passes.md`	Added/renumbered documentation for utility passes.
`docs/zh-cn/dev/passes/09-convert_tensor_to_tile_ops.md`	Chinese documentation for ConvertTensorToTileOps aligned with the refactor.
`docs/zh-cn/dev/passes/10-optimize_orch_tensors.md`	Chinese documentation for OptimizeOrchTensors aligned with the refactor.
`docs/zh-cn/dev/passes/11-flatten_tile_nd_to_2d.md`	Chinese doc added/renumbered to match pipeline ordering.
`docs/zh-cn/dev/passes/14-expand_mixed_kernel.md`	Chinese doc added/renumbered to match pipeline ordering.
`docs/zh-cn/dev/passes/15-init_memref.md`	Chinese doc added/renumbered to match pipeline ordering.
`docs/zh-cn/dev/passes/16-memory_reuse.md`	Chinese doc added/renumbered to match pipeline ordering.
`docs/zh-cn/dev/passes/17-allocate_memory_addr.md`	Chinese doc added/renumbered to match pipeline ordering.
`docs/zh-cn/dev/passes/90-insert_sync.md`	Chinese documentation for InsertSync added.
`docs/zh-cn/dev/passes/91-utility_passes.md`	Chinese documentation for utility passes added.
`.claude/rules/pass-doc-ordering.md`	Updates the doc numbering map to reflect the new pipeline order.

src/ir/transforms/optimize_orch_tensors_pass.cpp

docs/en/dev/passes/10-optimize_orch_tensors.md

docs/zh-cn/dev/passes/10-optimize_orch_tensors.md

tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py

…ents Restore UpgradeWrittenTensorParamDirections in ConvertTensorToTileOps that was accidentally removed during the OptimizeOrchTensors extraction. This fixes CI failures where InCore params written via tile.store were not upgraded from In to Out/InOut, causing codegen to emit add_input instead of add_output/add_inout. Also addresses PR review feedback: - Replace manual StmtUsesVar with VarDefUseCollector (handles all stmts) - Fix pass properties tables in docs (add IncoreTileOps) - Clarify assemble-loop rewrite attribution in docs - Remove "future" qualifier from Pattern 2 test docstring - Update test expectation for tensor.write param direction

coderabbitai

♻️ Duplicate comments (2)

src/ir/transforms/optimize_orch_tensors_pass.cpp (2)
327-330: ⚠️ Potential issue | 🔴 Critical

Don't globalize a reuse decision from the first matching caller.

results_ is keyed only by callee name, so the first loop that matches rewrites that InCore signature for every call site. Any later caller that does not satisfy the same merges set will still lose those Out args and start reusing/mutating its input buffer in place. Please either prove that all callers agree on the exact merge set before storing here, or keep this optimization scoped to the matching call sites only.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/transforms/optimize_orch_tensors_pass.cpp` around lines 327 - 330, The
current code stores reuse_result into results_ keyed only by fname when
reuse_result.merges is non-empty, which globalizes a caller-specific merge
decision; change this so we either (A) validate that every call site for fname
has an identical reuse_result.merges before writing into results_ (iterate call
sites and compare merges sets) or (B) avoid modifying results_ and instead apply
reuse_result only to the specific matching call site(s) (i.e., keep the
optimization scoped locally rather than storing results_[fname] =
std::move(reuse_result)); use the existing symbols results_, reuse_result, fname
and the merges set to implement the chosen fix.
1077-1080: ⚠️ Potential issue | 🔴 Critical

Only rewrite/delete a loop seed when it is a dead tensor.create.

This matcher currently switches the iter-arg init to the Out param for any var-like seed and immediately records that seed in dead_create_vars_. If the seed comes from a real tensor value, or is referenced later, the rewrite changes semantics and LoopRewriteMutator will also erase a non-tensor.create assignment. Please require a tensor.create definition and prove the seed has no remaining uses before inserting it into the dead set.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/transforms/optimize_orch_tensors_pass.cpp` around lines 1077 - 1080,
The code currently flips the loop iter-arg init to the Out param and
unconditionally inserts the seed Var (init_var from
As<Var>(op->iter_args_[0]->initValue_)) into dead_create_vars_, which can
rewrite non-tensor.create seeds; change this to only perform the rewrite and
insert into dead_create_vars_ when the seed is proven to be a tensor.create with
no remaining uses. Concretely: after obtaining init_var, resolve its defining
expression (the Var's definition) and verify it's a tensor.create call (e.g.,
the defining node is a Call/Invoke of "tensor.create" or equivalent), and then
prove the Var has zero remaining uses before mutating iter_args_[0]->initValue_
and inserting init_var.get() into dead_create_vars_; otherwise leave the
iter-arg and do not record the var so LoopRewriteMutator won't erase
non-tensor.create assignments.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/ir/transforms/optimize_orch_tensors_pass.cpp`:
- Around line 327-330: The current code stores reuse_result into results_ keyed
only by fname when reuse_result.merges is non-empty, which globalizes a
caller-specific merge decision; change this so we either (A) validate that every
call site for fname has an identical reuse_result.merges before writing into
results_ (iterate call sites and compare merges sets) or (B) avoid modifying
results_ and instead apply reuse_result only to the specific matching call
site(s) (i.e., keep the optimization scoped locally rather than storing
results_[fname] = std::move(reuse_result)); use the existing symbols results_,
reuse_result, fname and the merges set to implement the chosen fix.
- Around line 1077-1080: The code currently flips the loop iter-arg init to the
Out param and unconditionally inserts the seed Var (init_var from
As<Var>(op->iter_args_[0]->initValue_)) into dead_create_vars_, which can
rewrite non-tensor.create seeds; change this to only perform the rewrite and
insert into dead_create_vars_ when the seed is proven to be a tensor.create with
no remaining uses. Concretely: after obtaining init_var, resolve its defining
expression (the Var's definition) and verify it's a tensor.create call (e.g.,
the defining node is a Call/Invoke of "tensor.create" or equivalent), and then
prove the Var has zero remaining uses before mutating iter_args_[0]->initValue_
and inserting init_var.get() into dead_create_vars_; otherwise leave the
iter-arg and do not record the var so LoopRewriteMutator won't erase
non-tensor.create assignments.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f2e76331-633b-4523-afc4-a41199c0512b

📥 Commits

Reviewing files that changed from the base of the PR and between e526c60 and ef273ea.

📒 Files selected for processing (7)

docs/en/dev/passes/09-convert_tensor_to_tile_ops.md
docs/en/dev/passes/10-optimize_orch_tensors.md
docs/zh-cn/dev/passes/09-convert_tensor_to_tile_ops.md
docs/zh-cn/dev/passes/10-optimize_orch_tensors.md
src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp
src/ir/transforms/optimize_orch_tensors_pass.cpp
tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py

✅ Files skipped from review due to trivial changes (4)

docs/en/dev/passes/10-optimize_orch_tensors.md
docs/en/dev/passes/09-convert_tensor_to_tile_ops.md
docs/zh-cn/dev/passes/10-optimize_orch_tensors.md
docs/zh-cn/dev/passes/09-convert_tensor_to_tile_ops.md

Hzfengsy added 2 commits April 11, 2026 14:44

Copilot AI review requested due to automatic review settings April 11, 2026 06:45

github-project-automation bot added this to pto project Apr 11, 2026

gemini-code-assist bot reviewed Apr 11, 2026

View reviewed changes

src/ir/transforms/optimize_orch_tensors_pass.cpp Show resolved Hide resolved

src/ir/transforms/optimize_orch_tensors_pass.cpp Show resolved Hide resolved

coderabbitai bot reviewed Apr 11, 2026

View reviewed changes

Copilot AI reviewed Apr 11, 2026

View reviewed changes

coderabbitai bot reviewed Apr 11, 2026

View reviewed changes

lyfne123 approved these changes Apr 11, 2026

View reviewed changes

lyfne123 merged commit ba0f363 into hw-native-sys:main Apr 11, 2026
14 of 15 checks passed

Hzfengsy deleted the worktree-ConvertTensorToTileOps branch April 11, 2026 13:52

This was referenced Apr 11, 2026

[RFC] Separate buffer optimizations from ConvertTensorToTileOps #962

Closed

[Code Health] Remove dead param direction inference (~400 LOC) from ConvertTensorToTileOps #973

Closed

Conversation

Hzfengsy commented Apr 11, 2026

Summary

Cross-layer changes

Testing

Related Issues

Uh oh!

coderabbitai bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Apr 11, 2026 •

edited

Loading