Skip to content

refactor(ir): extract OptimizeOrchTensors pass and simplify ConvertTensorToTileOps#969

Merged
lyfne123 merged 3 commits intohw-native-sys:mainfrom
Hzfengsy:worktree-ConvertTensorToTileOps
Apr 11, 2026
Merged

refactor(ir): extract OptimizeOrchTensors pass and simplify ConvertTensorToTileOps#969
lyfne123 merged 3 commits intohw-native-sys:mainfrom
Hzfengsy:worktree-ConvertTensorToTileOps

Conversation

@Hzfengsy
Copy link
Copy Markdown
Member

Summary

Addresses #962 (Approach A — post-pass) by extracting buffer optimization patterns from ConvertTensorToTileOps into a standalone OptimizeOrchTensors pass.

  • Extract OptimizeOrchTensors as a new pass with 3 pattern classes:
    • IterArgReuseOptimizer (Pattern 1): merges Out→InOut for loop-carried buffers
    • AssembleParentStridesOptimizer (Pattern 2): attaches parent-tensor strides via TensorView
    • AssembleLoopRewriter (Pattern 3): rewrites tile.assemble loops to tile.store loops
  • Simplify ConvertTensorToTileOps: remove alias analysis (~390 lines), remove iter-arg mapping, remove IfStmt store sinking — now purely mechanical tensor→tile conversion
  • Restore MatmulSlice handling: was accidentally removed — produces tile.load(Mat, transpose=...) for slice→matmul patterns
  • Replace local VarUseVisitor with shared var_collectors::VarDefUseCollector
  • Remove dead Pattern 3 (loop hoisting) code from the old implementation
  • Add documentation for both passes (en + zh-cn) and renumber existing pass docs to match pipeline order
  • Add IncoreTileOps to OptimizeOrchTensors pass properties (required + produced) to enforce correct ordering

Cross-layer changes

Layer Files
C++ implementation convert_tensor_to_tile_ops_pass.cpp, optimize_orch_tensors_pass.cpp (new)
C++ headers passes.h, pass_properties.h
Build system CMakeLists.txt
Python bindings passes.cpp
Type stubs passes.pyi
Pass manager pass_manager.py
Tests test_convert_tensor_to_tile_ops.py, test_optimize_orch_tensors.py (new), test_pass_manager.py
Documentation 09-convert_tensor_to_tile_ops.md (new), 10-optimize_orch_tensors.md (new), doc renumbering

Testing

  • All 71 transform tests pass (51 ConvertTensorToTileOps + 8 OptimizeOrchTensors + 12 PassManager)
  • Full transform suite: 959 passed, 12 skipped
  • Clang-tidy clean (all findings fixed)
  • All pre-commit hooks pass (clang-format, cpplint, ruff, pyright, markdownlint)

Related Issues

Addresses #962

…OrchTensors

- Extract optimize_orch_tensors_pass.cpp as standalone pass with 3 pattern
  classes: IterArgReuseOptimizer, AssembleParentStridesOptimizer,
  AssembleLoopRewriter (previously interleaved free functions)
- Remove dead Pattern 3 (loop hoisting) code from OptimizeOrchTensors
- Restore MatmulSlice handling in ConvertTensorToTileOps (was accidentally
  removed — produces tile.load(Mat, transpose=...) for slice→matmul patterns)
- Remove alias analysis (~390 lines) from ConvertTensorToTileOps as it
  violates single-responsibility; direction inference handled by
  OptimizeOrchTensors Pattern 1
- Replace local VarUseVisitor with common var_collectors::VarDefUseCollector
- Add documentation for both passes (en + zh-cn) and renumber pass docs
  to match pipeline execution order
- Add IncoreTileOps to required/produced in kOptimizeOrchTensorsProperties
  to enforce correct pass ordering (must run after ConvertTensorToTileOps)
- Renumber Pattern 4 → Pattern 3 in docs (old Pattern 3 was removed)
- Fix type stub docstring to mention both orchestration and InCore functions
Copilot AI review requested due to automatic review settings April 11, 2026 06:45
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR extracts orchestration-level buffer/layout optimizations from ConvertTensorToTileOps into a new OptimizeOrchTensors pass, adds its implementation, docs, Python bindings, tests, and updates pass ordering and registration; ConvertTensorToTileOps is simplified accordingly.

Changes

Cohort / File(s) Summary
Pass Documentation Ordering
​.claude/rules/pass-doc-ordering.md
Updated pass ordering: added 09-convert_tensor_to_tile_ops.md and 10-optimize_orch_tensors.md, renumbered subsequent entries and moved non-default strategy docs to 90/91.
Build System
CMakeLists.txt
Added src/ir/transforms/optimize_orch_tensors_pass.cpp to compiled sources.
English Docs
docs/en/dev/passes/09-convert_tensor_to_tile_ops.md, docs/en/dev/passes/10-optimize_orch_tensors.md
Added ConvertTensorToTileOps doc and new OptimizeOrchTensors doc describing pass APIs, preconditions, sequencing, and three optimization patterns.
Chinese Docs
docs/zh-cn/dev/passes/09-convert_tensor_to_tile_ops.md, docs/zh-cn/dev/passes/10-optimize_orch_tensors.md
Added Chinese equivalents for both pass docs.
Pass Properties
include/pypto/ir/transforms/pass_properties.h
Added kOptimizeOrchTensorsProperties with required/produced {SplitIncoreOrch, IncoreTileOps}.
Pass Declarations
include/pypto/ir/transforms/passes.h
Declared new exported pass pass::OptimizeOrchTensors() and its sequencing after ConvertTensorToTileOps.
Python Bindings & Stubs
python/bindings/modules/passes.cpp, python/pypto/pypto_core/passes.pyi
Added passes.optimize_orch_tensors factory binding and updated .pyi exports (including optimize_orch_tensors).
Pass Manager Integration
python/pypto/ir/pass_manager.py, tests/ut/ir/transforms/test_pass_manager.py
Registered OptimizeOrchTensors in tensor_only_passes, making it part of the default optimization strategy; tests updated to expect it.
ConvertTensorToTileOps Implementation
src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp
Removed in-pass orchestration analyses (AnalyzeIterArgMappings, AnalyzeAssembleParentShapes) and related helpers; simplified TransformIncoreFunction and variable-use detection; left matmul-slice collection local.
New Pass Implementation
src/ir/transforms/optimize_orch_tensors_pass.cpp
Added new program pass implementing three ordered optimizers: IterArgReuseOptimizer, AssembleParentStridesOptimizer, AssembleLoopRewriter (iter-arg merge to InOut, parent-stride propagation via TensorView, assemble-loop → tile.store rewrite).
Tests — ConvertTensorToTileOps
tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py
Adjusted/removed tests that previously asserted iter-arg/store-sinking and assemble-parent-stride effects; updated expected IR to reflect simpler ConvertTensorToTileOps outputs.
Tests — New Pass
tests/ut/ir/transforms/test_optimize_orch_tensors.py
Added comprehensive tests covering iter-arg reuse, assemble parent-stride enrichment, assemble-loop rewrite, loop hoisting, and edge cases.

Sequence Diagram(s)

sequenceDiagram
    participant PM as Program
    participant PassMgr as PassManager
    participant OptPass as OptimizeOrchTensors
    participant ConvPass as ConvertTensorToTileOps
    participant IC as InCoreFunction
    participant Orch as Orchestration

    rect rgba(200,230,255,0.5)
    PM->>PassMgr: request optimization (Default)
    PassMgr->>ConvPass: run ConvertTensorToTileOps
    ConvPass->>IC: convert tensor.* → tile.* (matmul-slice prescan)
    end

    rect rgba(200,255,200,0.5)
    PassMgr->>OptPass: run OptimizeOrchTensors
    OptPass->>Orch: analyze tensor.create / assemble patterns in orchestration
    OptPass->>IC: rewrite InCore signatures (merge Out→InOut, update TensorView strides)
    OptPass->>Orch: update callsites (remove tensor.create, adjust calls)
    OptPass->>IC: rewrite assemble-loops → tile.store
    end

    PM->>PM: resulting optimized IR
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~85 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • lyfne123

🐰 Hop! A pass split—so neat!
Iter-args find their match,
Strides line up in place,
Loops rewrite with grace,
Orchestration hums apace! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.96% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main refactoring: extracting OptimizeOrchTensors as a new pass and simplifying ConvertTensorToTileOps, which are the core changes in this PR.
Description check ✅ Passed The description is well-structured and directly related to the changeset, clearly explaining the motivation (issue #962), the three pattern classes extracted, simplifications made, and testing results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the OptimizeOrchTensors pass, which optimizes tensor buffer usage between orchestration and InCore functions through three patterns: iter-arg reuse, assemble parent stride propagation, and assemble-loop rewriting. It also refactors ConvertTensorToTileOps by moving cross-function analyses to this new pass. The feedback focuses on improving the robustness of IR traversals in the new pass, specifically recommending the use of shared utilities like VarDefUseCollector to ensure nested control flow structures are correctly handled during variable use analysis and loop optimization.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
tests/ut/ir/transforms/test_optimize_orch_tensors.py (1)

21-236: Add a WhileStmt regression for Pattern 1.

The new pass docs and analyzer cover both ForStmt and WhileStmt, but this suite only exercises pl.range loops. A small while fixture would guard the VisitStmt_(const WhileStmtPtr&) path from drifting unnoticed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ut/ir/transforms/test_optimize_orch_tensors.py` around lines 21 - 236,
Add a new unit test method in the TestIterArgReuse class (e.g.,
test_while_iter_arg_reuse) that mirrors one of the existing ForStmt cases but
uses a WhileStmt-style loop in the main function to exercise the
VisitStmt_(const WhileStmtPtr&) path; keep the same main_incore_0 signature and
expected transformation (Out param merged into InOut), run After =
passes.optimize_orch_tensors()(Before) and call
ir.assert_structural_equal(After, Expected) so the optimize_orch_tensors pass
and the main/main_incore_0 pair are validated for a while-loop iter-arg
scenario.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/en/dev/passes/10-optimize_orch_tensors.md`:
- Around line 125-131: Update the pass contract table to include IncoreTileOps:
add IncoreTileOps to the Required column alongside SplitIncoreOrch and to the
Produced (or indicate as preserved) column so the doc reflects that the pass
requires and preserves IncoreTileOps to enforce ordering after
ConvertTensorToTileOps; reference the IncoreTileOps symbol and
ConvertTensorToTileOps in the brief note so readers understand why the contract
changed.

In `@docs/zh-cn/dev/passes/09-convert_tensor_to_tile_ops.md`:
- Around line 48-49: Update the docs to remove the claim that the assemble-loop
rewrite is performed in ConvertTensorToTileOps and instead point readers to the
later pass that performs it: change the sentence referencing "如果返回值来自
`tile.assemble` 循环,则将循环重写为直接使用 `tile.store`(assemble-loop 重写)" so it no longer
attributes the rewrite to `ConvertTensorToTileOps`, and add a brief note
directing readers to the `OptimizeOrchTensors` pass (and the term "assemble-loop
重写") as the component that actually performs this loop rewrite.

In `@docs/zh-cn/dev/passes/10-optimize_orch_tensors.md`:
- Around line 125-131: The Pass 属性表 is missing IncoreTileOps: update the table
so both the Required and Produced entries list the full set {SplitIncoreOrch,
IncoreTileOps} (i.e., replace the single "SplitIncoreOrch" value with the pair
that matches the implementation), ensuring the Pass attributes for the pass
named/represented by SplitIncoreOrch and IncoreTileOps in this document reflect
the actual implementation.

In `@src/ir/transforms/optimize_orch_tensors_pass.cpp`:
- Around line 1076-1124: The loop rewrite assumes op->iter_args_[0]->initValue_
is a disposable tensor created by tensor.create but doesn't verify it; update
VisitStmt_(const ForStmtPtr& op) to check that op->iter_args_[0]->initValue_ is
an Assign/Var whose defining statement is a tensor.create before inserting into
dead_create_vars_. Specifically, locate the use of iter_args_[0]->initValue_
(and the code that sets init_var and dead_create_vars_) and: 1) resolve
initValue_ to its defining AssignStmt/Call and confirm the call->op_->name_ ==
"tensor.create"; 2) optionally ensure the created Var has no other uses (reuse
StmtUsesVar or similar) before marking it dead; only then insert init_var.get()
into dead_create_vars_, otherwise skip marking it.
- Around line 326-329: The current code stores reuse_result into results_ keyed
only by fname causing the first matching caller to overwrite the callee
signature for all call sites; change this by either (A) verifying that every
call site of the callee yields the identical reuse_result.merges set before
writing to results_ (use the module's call graph / iterate all callers and
compare merge sets) and only then store results_[fname] = reuse_result and call
RewriteIncore()/CallSiteRewriter(), or (B) avoid global mutation and keep the
rewrite local: do not insert into results_ for fname, instead record
per-callsite rewrites (keyed by the caller or CallSite id) and only modify that
caller/callsite; update the logic around results_, reuse_result,
RewriteIncore(), and CallSiteRewriter() accordingly (same change also for the
analogous block at lines ~460-479).

---

Nitpick comments:
In `@tests/ut/ir/transforms/test_optimize_orch_tensors.py`:
- Around line 21-236: Add a new unit test method in the TestIterArgReuse class
(e.g., test_while_iter_arg_reuse) that mirrors one of the existing ForStmt cases
but uses a WhileStmt-style loop in the main function to exercise the
VisitStmt_(const WhileStmtPtr&) path; keep the same main_incore_0 signature and
expected transformation (Out param merged into InOut), run After =
passes.optimize_orch_tensors()(Before) and call
ir.assert_structural_equal(After, Expected) so the optimize_orch_tensors pass
and the main/main_incore_0 pair are validated for a while-loop iter-arg
scenario.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0a159f65-9ebf-4c15-b0a6-b8fce2ed2b04

📥 Commits

Reviewing files that changed from the base of the PR and between b2104ad and e526c60.

📒 Files selected for processing (30)
  • .claude/rules/pass-doc-ordering.md
  • CMakeLists.txt
  • docs/en/dev/passes/09-convert_tensor_to_tile_ops.md
  • docs/en/dev/passes/10-optimize_orch_tensors.md
  • docs/en/dev/passes/11-flatten_tile_nd_to_2d.md
  • docs/en/dev/passes/14-expand_mixed_kernel.md
  • docs/en/dev/passes/15-init_memref.md
  • docs/en/dev/passes/16-memory_reuse.md
  • docs/en/dev/passes/17-allocate_memory_addr.md
  • docs/en/dev/passes/90-insert_sync.md
  • docs/en/dev/passes/91-utility_passes.md
  • docs/zh-cn/dev/passes/09-convert_tensor_to_tile_ops.md
  • docs/zh-cn/dev/passes/10-optimize_orch_tensors.md
  • docs/zh-cn/dev/passes/11-flatten_tile_nd_to_2d.md
  • docs/zh-cn/dev/passes/14-expand_mixed_kernel.md
  • docs/zh-cn/dev/passes/15-init_memref.md
  • docs/zh-cn/dev/passes/16-memory_reuse.md
  • docs/zh-cn/dev/passes/17-allocate_memory_addr.md
  • docs/zh-cn/dev/passes/90-insert_sync.md
  • docs/zh-cn/dev/passes/91-utility_passes.md
  • include/pypto/ir/transforms/pass_properties.h
  • include/pypto/ir/transforms/passes.h
  • python/bindings/modules/passes.cpp
  • python/pypto/ir/pass_manager.py
  • python/pypto/pypto_core/passes.pyi
  • src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp
  • src/ir/transforms/optimize_orch_tensors_pass.cpp
  • tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py
  • tests/ut/ir/transforms/test_optimize_orch_tensors.py
  • tests/ut/ir/transforms/test_pass_manager.py

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the IR transform pipeline by extracting cross-function buffer/tensor optimizations out of ConvertTensorToTileOps into a new standalone OptimizeOrchTensors pass, leaving ConvertTensorToTileOps as a largely mechanical tensor→tile lowering and call-site update pass. It updates C++/Python pass registration, pass-manager ordering, adds focused unit tests for the new pass, and renumbers/extends pass documentation to match the updated pipeline.

Changes:

  • Introduce OptimizeOrchTensors (new pass) implementing iter-arg reuse, assemble parent-stride propagation, and assemble-loop rewrite.
  • Simplify ConvertTensorToTileOps by removing iter-arg/assemble analyses and parameter-direction inference, while keeping conversion-time MatmulSlice handling.
  • Update bindings, pass manager, tests, and English/Chinese pass docs (including doc ordering renumbering).

Reviewed changes

Copilot reviewed 16 out of 30 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp Removes bundled buffer optimizations/analyses; keeps core tensor→tile conversion and assemble-loop conversion behavior.
src/ir/transforms/optimize_orch_tensors_pass.cpp New pass implementing the extracted orchestration/InCore tensor buffer optimizations (Patterns 1–3).
include/pypto/ir/transforms/passes.h Declares the new OptimizeOrchTensors pass and documents intended behavior/order.
include/pypto/ir/transforms/pass_properties.h Adds pass properties for OptimizeOrchTensors to enforce correct pipeline ordering.
python/bindings/modules/passes.cpp Exposes optimize_orch_tensors in Python bindings with a pass description.
python/pypto/ir/pass_manager.py Registers OptimizeOrchTensors in the default pipeline ordering after ConvertTensorToTileOps.
python/pypto/pypto_core/passes.pyi Adds stub for optimize_orch_tensors and exports it via __all__.
CMakeLists.txt Adds the new C++ pass source to the build.
tests/ut/ir/transforms/test_pass_manager.py Updates expected pass order to include OptimizeOrchTensors.
tests/ut/ir/transforms/test_optimize_orch_tensors.py New unit tests covering the three OptimizeOrchTensors patterns and edge cases.
tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py Updates/realigns tests to validate “naive” ConvertTensorToTileOps behavior post-extraction.
docs/en/dev/passes/09-convert_tensor_to_tile_ops.md New/updated documentation describing ConvertTensorToTileOps’ simplified responsibilities.
docs/en/dev/passes/10-optimize_orch_tensors.md New documentation for OptimizeOrchTensors patterns and intended placement.
docs/en/dev/passes/11-flatten_tile_nd_to_2d.md Renumbered/added doc to match pipeline ordering.
docs/en/dev/passes/14-expand_mixed_kernel.md Renumbered doc to match pipeline ordering.
docs/en/dev/passes/15-init_memref.md Renumbered doc to match pipeline ordering.
docs/en/dev/passes/16-memory_reuse.md Renumbered doc to match pipeline ordering.
docs/en/dev/passes/17-allocate_memory_addr.md Renumbered doc to match pipeline ordering.
docs/en/dev/passes/90-insert_sync.md Added/renumbered documentation for InsertSync.
docs/en/dev/passes/91-utility_passes.md Added/renumbered documentation for utility passes.
docs/zh-cn/dev/passes/09-convert_tensor_to_tile_ops.md Chinese documentation for ConvertTensorToTileOps aligned with the refactor.
docs/zh-cn/dev/passes/10-optimize_orch_tensors.md Chinese documentation for OptimizeOrchTensors aligned with the refactor.
docs/zh-cn/dev/passes/11-flatten_tile_nd_to_2d.md Chinese doc added/renumbered to match pipeline ordering.
docs/zh-cn/dev/passes/14-expand_mixed_kernel.md Chinese doc added/renumbered to match pipeline ordering.
docs/zh-cn/dev/passes/15-init_memref.md Chinese doc added/renumbered to match pipeline ordering.
docs/zh-cn/dev/passes/16-memory_reuse.md Chinese doc added/renumbered to match pipeline ordering.
docs/zh-cn/dev/passes/17-allocate_memory_addr.md Chinese doc added/renumbered to match pipeline ordering.
docs/zh-cn/dev/passes/90-insert_sync.md Chinese documentation for InsertSync added.
docs/zh-cn/dev/passes/91-utility_passes.md Chinese documentation for utility passes added.
.claude/rules/pass-doc-ordering.md Updates the doc numbering map to reflect the new pipeline order.

…ents

Restore UpgradeWrittenTensorParamDirections in ConvertTensorToTileOps
that was accidentally removed during the OptimizeOrchTensors extraction.
This fixes CI failures where InCore params written via tile.store were
not upgraded from In to Out/InOut, causing codegen to emit add_input
instead of add_output/add_inout.

Also addresses PR review feedback:
- Replace manual StmtUsesVar with VarDefUseCollector (handles all stmts)
- Fix pass properties tables in docs (add IncoreTileOps)
- Clarify assemble-loop rewrite attribution in docs
- Remove "future" qualifier from Pattern 2 test docstring
- Update test expectation for tensor.write param direction
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/ir/transforms/optimize_orch_tensors_pass.cpp (2)

327-330: ⚠️ Potential issue | 🔴 Critical

Don't globalize a reuse decision from the first matching caller.

results_ is keyed only by callee name, so the first loop that matches rewrites that InCore signature for every call site. Any later caller that does not satisfy the same merges set will still lose those Out args and start reusing/mutating its input buffer in place. Please either prove that all callers agree on the exact merge set before storing here, or keep this optimization scoped to the matching call sites only.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/transforms/optimize_orch_tensors_pass.cpp` around lines 327 - 330, The
current code stores reuse_result into results_ keyed only by fname when
reuse_result.merges is non-empty, which globalizes a caller-specific merge
decision; change this so we either (A) validate that every call site for fname
has an identical reuse_result.merges before writing into results_ (iterate call
sites and compare merges sets) or (B) avoid modifying results_ and instead apply
reuse_result only to the specific matching call site(s) (i.e., keep the
optimization scoped locally rather than storing results_[fname] =
std::move(reuse_result)); use the existing symbols results_, reuse_result, fname
and the merges set to implement the chosen fix.

1077-1080: ⚠️ Potential issue | 🔴 Critical

Only rewrite/delete a loop seed when it is a dead tensor.create.

This matcher currently switches the iter-arg init to the Out param for any var-like seed and immediately records that seed in dead_create_vars_. If the seed comes from a real tensor value, or is referenced later, the rewrite changes semantics and LoopRewriteMutator will also erase a non-tensor.create assignment. Please require a tensor.create definition and prove the seed has no remaining uses before inserting it into the dead set.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/transforms/optimize_orch_tensors_pass.cpp` around lines 1077 - 1080,
The code currently flips the loop iter-arg init to the Out param and
unconditionally inserts the seed Var (init_var from
As<Var>(op->iter_args_[0]->initValue_)) into dead_create_vars_, which can
rewrite non-tensor.create seeds; change this to only perform the rewrite and
insert into dead_create_vars_ when the seed is proven to be a tensor.create with
no remaining uses. Concretely: after obtaining init_var, resolve its defining
expression (the Var's definition) and verify it's a tensor.create call (e.g.,
the defining node is a Call/Invoke of "tensor.create" or equivalent), and then
prove the Var has zero remaining uses before mutating iter_args_[0]->initValue_
and inserting init_var.get() into dead_create_vars_; otherwise leave the
iter-arg and do not record the var so LoopRewriteMutator won't erase
non-tensor.create assignments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/ir/transforms/optimize_orch_tensors_pass.cpp`:
- Around line 327-330: The current code stores reuse_result into results_ keyed
only by fname when reuse_result.merges is non-empty, which globalizes a
caller-specific merge decision; change this so we either (A) validate that every
call site for fname has an identical reuse_result.merges before writing into
results_ (iterate call sites and compare merges sets) or (B) avoid modifying
results_ and instead apply reuse_result only to the specific matching call
site(s) (i.e., keep the optimization scoped locally rather than storing
results_[fname] = std::move(reuse_result)); use the existing symbols results_,
reuse_result, fname and the merges set to implement the chosen fix.
- Around line 1077-1080: The code currently flips the loop iter-arg init to the
Out param and unconditionally inserts the seed Var (init_var from
As<Var>(op->iter_args_[0]->initValue_)) into dead_create_vars_, which can
rewrite non-tensor.create seeds; change this to only perform the rewrite and
insert into dead_create_vars_ when the seed is proven to be a tensor.create with
no remaining uses. Concretely: after obtaining init_var, resolve its defining
expression (the Var's definition) and verify it's a tensor.create call (e.g.,
the defining node is a Call/Invoke of "tensor.create" or equivalent), and then
prove the Var has zero remaining uses before mutating iter_args_[0]->initValue_
and inserting init_var.get() into dead_create_vars_; otherwise leave the
iter-arg and do not record the var so LoopRewriteMutator won't erase
non-tensor.create assignments.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f2e76331-633b-4523-afc4-a41199c0512b

📥 Commits

Reviewing files that changed from the base of the PR and between e526c60 and ef273ea.

📒 Files selected for processing (7)
  • docs/en/dev/passes/09-convert_tensor_to_tile_ops.md
  • docs/en/dev/passes/10-optimize_orch_tensors.md
  • docs/zh-cn/dev/passes/09-convert_tensor_to_tile_ops.md
  • docs/zh-cn/dev/passes/10-optimize_orch_tensors.md
  • src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp
  • src/ir/transforms/optimize_orch_tensors_pass.cpp
  • tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py
✅ Files skipped from review due to trivial changes (4)
  • docs/en/dev/passes/10-optimize_orch_tensors.md
  • docs/en/dev/passes/09-convert_tensor_to_tile_ops.md
  • docs/zh-cn/dev/passes/10-optimize_orch_tensors.md
  • docs/zh-cn/dev/passes/09-convert_tensor_to_tile_ops.md

@lyfne123 lyfne123 merged commit ba0f363 into hw-native-sys:main Apr 11, 2026
14 of 15 checks passed
@Hzfengsy Hzfengsy deleted the worktree-ConvertTensorToTileOps branch April 11, 2026 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants