Skip to content

refactor(ir): unify memory space requirements into OpConversionRegistry#974

Open
Hzfengsy wants to merge 2 commits intohw-native-sys:mainfrom
Hzfengsy:refactor/unify-input-space-reqs
Open

refactor(ir): unify memory space requirements into OpConversionRegistry#974
Hzfengsy wants to merge 2 commits intohw-native-sys:mainfrom
Hzfengsy:refactor/unify-input-space-reqs

Conversation

@Hzfengsy
Copy link
Copy Markdown
Member

Summary

  • Add InputSpaceReq and ConversionEntry to OpConversionRegistry, allowing converters to declare per-input memory space requirements as metadata
  • Replace the special-purpose MatmulSlicePatternCollector with a general ConsumerSpaceCollector driven by registered converter metadata
  • Add framework auto-bridging in TensorToTileMutator that automatically loads TensorType args to the required memory space before calling converters
  • Simplify matmul/matmul_acc converters from imperative load+compute to pure compute-op emitters
  • Remove LoadOperandToMat helper (no longer needed)

Motivation

Memory space handling was spread across three independent mechanisms (Phase-1 entry loads, MatmulSlicePatternCollector, LoadOperandToMat) with a hardcoded kSelfLoadingOps exclusion list. Adding any new op that needs non-Vec inputs would require a new special-case collector. This refactoring unifies the approach: converters declare what they need, and the framework handles the rest.

Closes #972

Testing

  • All 3454 unit tests pass
  • All 51 test_convert_tensor_to_tile_ops tests pass (including slice→matmul patterns)
  • clang-tidy clean
  • Code review completed

Move per-input memory space declarations into OpConversionRegistry via
InputSpaceReq, replacing the special-purpose MatmulSlicePatternCollector
with a general ConsumerSpaceCollector driven by registered metadata.

The framework now auto-bridges TensorType args to the required memory
space before calling converters, eliminating LoadOperandToMat and
simplifying matmul/matmul_acc converters to pure compute-op emitters.
Copilot AI review requested due to automatic review settings April 11, 2026 15:55
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 007da75c-fa30-450e-9509-22978b989bb7

📥 Commits

Reviewing files that changed from the base of the PR and between c7ea698 and 96dfebb.

📒 Files selected for processing (2)
  • src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp
  • src/ir/transforms/op_conversion_registry.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/ir/transforms/op_conversion_registry.cpp
  • src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp

📝 Walkthrough

Walkthrough

Extended the op conversion registry with per-input memory-space requirements (InputSpaceReq, ConversionEntry) and updated the tensor→tile conversion pass to pre-scan consumer space needs and automatically bridge argument memory spaces via synthetic tile.load/tile.move before invoking converters.

Changes

Cohort / File(s) Summary
Registry Types & APIs
include/pypto/ir/transforms/op_conversion_registry.h, src/ir/transforms/op_conversion_registry.cpp
Added InputSpaceReq (MemorySpace + optional transpose kwarg) and ConversionEntry (ConversionFunc + input_reqs); extended RegisterSimple/RegisterCustom to accept input_reqs; Lookup now returns const ConversionEntry*; stored conversions map updated; removed LoadOperandToMat.
Conversion Pass & Mutator
src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp
Replaced matmul-specific pre-scan with ConsumerSpaceCollector; collect per-variable required MemorySpace/transpose from registry metadata; added BridgeInputSpaces to synthesize tile.load/tile.move prologue for required input spaces and rewrite call args; TensorToTileMutator uses bridged args and merges bridge prologue + converter prologue; removed matmul-only slice logic.
Op Registration Changes
src/ir/transforms/op_conversion_registry.cpp
Updated tensor.matmul and tensor.matmul_acc registrations to declare per-input input_reqs (Mat space + transpose kwarg names) and to emit pure tile.* compute ops without embedded load prologues.
Headers / Includes
include/pypto/ir/transforms/op_conversion_registry.h, src/ir/transforms/...
Added includes: <cstddef>, <optional>, <unordered_map> and pypto/ir/memory_space.h to support new types and maps.

Sequence Diagram(s)

sequenceDiagram
    participant Pass as Conversion Pass
    participant Registry as OpConversionRegistry
    participant Collector as ConsumerSpaceCollector
    participant Mutator as TensorToTileMutator
    participant Bridge as BridgeInputSpaces
    participant Converter as Op Converter

    Pass->>Collector: Pre-scan IR for consumer needs
    Collector->>Registry: Lookup ConversionEntry (func + input_reqs)
    Registry-->>Collector: Return input_reqs per-op
    Collector-->>Pass: Return var→(MemorySpace, transpose) map

    Pass->>Mutator: Start mutating function (with consumer map)

    Note over Mutator: For each converted call
    Mutator->>Bridge: BridgeInputSpaces(call args, input_reqs)
    Bridge->>Bridge: For each arg: check actual vs required space
    Bridge->>Bridge: Emit synthetic `tile.load`/`tile.move` as needed
    Bridge-->>Mutator: Return bridged args + bridge prologue

    Mutator->>Converter: Invoke converter func with bridged args
    Converter->>Converter: Emit tile.* compute call (no loads)
    Converter-->>Mutator: Return result + converter prologue

    Mutator->>Mutator: Merge bridge prologue + converter prologue + result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • lyfne123

Poem

🐰 I hop through code with careful paws,

Bridging spaces, obeying laws.
No more special-case, no tangled mat;
Inputs aligned — a tidy format.
A registry hums, converters sing, hop! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: unifying memory space requirements into OpConversionRegistry, which is the central refactoring theme across all modified files.
Description check ✅ Passed The description provides relevant context about the changes, including motivation for the refactoring, the main components added/modified, and testing outcomes; it clearly relates to the changeset.
Linked Issues check ✅ Passed All core requirements from #972 are met: InputSpaceReq and ConversionEntry types added, OpConversionRegistry extended with input_reqs, ConsumerSpaceCollector implemented, auto-bridging in TensorToTileMutator, matmul converters simplified, and special-case mechanisms removed.
Out of Scope Changes check ✅ Passed All changes directly support the unification objective: new registry types, converter simplifications, collector replacement, and bridging logic; no extraneous modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors convert_tensor_to_tile_ops to centralize per-input memory space requirements in OpConversionRegistry, enabling framework-driven memory space bridging and replacing the special-cased tensor.slice → tensor.matmul look-ahead with a metadata-driven collector.

Changes:

  • Extend OpConversionRegistry to store converter functions alongside per-input InputSpaceReq metadata.
  • Replace MatmulSlicePatternCollector with a general ConsumerSpaceCollector that reads input space requirements from the registry.
  • Add call-site auto-bridging in TensorToTileMutator (currently implemented as TensorType tile.load insertion) and simplify matmul converters to pure compute emitters.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
include/pypto/ir/transforms/op_conversion_registry.h Introduces InputSpaceReq/ConversionEntry and updates registration/lookup APIs.
src/ir/transforms/op_conversion_registry.cpp Migrates registry storage to ConversionEntry and registers matmul input space requirements.
src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp Adds consumer-space pre-scan, wires registry entries into conversion, and implements input bridging + consumer-driven slice load overrides.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request generalizes the mechanism for handling memory space requirements during tensor-to-tile conversion. It replaces the specialized MatmulSlicePatternCollector with a metadata-driven ConsumerSpaceCollector that utilizes InputSpaceReq definitions in the OpConversionRegistry. Key additions include the BridgeInputSpaces utility for automatic tile.load insertion and updates to tensor.matmul and tensor.matmul_acc to leverage this framework. Review feedback highlights a potential issue where skipping operations with any input requirements might lead to type mismatches for arguments without requirements, such as accumulators. Additionally, it is suggested to refine the 'first consumer wins' strategy to better handle conflicting memory space requirements by prioritizing specialized spaces over the default.

- Add GlobalVar guard in ConsumerSpaceCollector for consistency with
  TensorArgsInConvertedOpsCollector and TensorToTileMutator.
- Prioritize non-Vec memory spaces in ConsumerSpaceCollector so a Vec
  requirement never shadows a later Mat/Left/Right/Acc/Bias requirement.
- Update stale comment above tensor.matmul / tensor.matmul_acc converters
  to reflect the new framework auto-bridging responsibility split.
- Phase-1 collector now excludes args per-index (not whole call) so
  matmul_acc's acc arg still receives a Phase-1 load when needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[RFC] Unify memory space requirements into OpConversionRegistry

2 participants