refactor(ir): unify memory space requirements into OpConversionRegistry#974
refactor(ir): unify memory space requirements into OpConversionRegistry#974Hzfengsy wants to merge 2 commits intohw-native-sys:mainfrom
Conversation
Move per-input memory space declarations into OpConversionRegistry via InputSpaceReq, replacing the special-purpose MatmulSlicePatternCollector with a general ConsumerSpaceCollector driven by registered metadata. The framework now auto-bridges TensorType args to the required memory space before calling converters, eliminating LoadOperandToMat and simplifying matmul/matmul_acc converters to pure compute-op emitters.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughExtended the op conversion registry with per-input memory-space requirements (InputSpaceReq, ConversionEntry) and updated the tensor→tile conversion pass to pre-scan consumer space needs and automatically bridge argument memory spaces via synthetic tile.load/tile.move before invoking converters. Changes
Sequence Diagram(s)sequenceDiagram
participant Pass as Conversion Pass
participant Registry as OpConversionRegistry
participant Collector as ConsumerSpaceCollector
participant Mutator as TensorToTileMutator
participant Bridge as BridgeInputSpaces
participant Converter as Op Converter
Pass->>Collector: Pre-scan IR for consumer needs
Collector->>Registry: Lookup ConversionEntry (func + input_reqs)
Registry-->>Collector: Return input_reqs per-op
Collector-->>Pass: Return var→(MemorySpace, transpose) map
Pass->>Mutator: Start mutating function (with consumer map)
Note over Mutator: For each converted call
Mutator->>Bridge: BridgeInputSpaces(call args, input_reqs)
Bridge->>Bridge: For each arg: check actual vs required space
Bridge->>Bridge: Emit synthetic `tile.load`/`tile.move` as needed
Bridge-->>Mutator: Return bridged args + bridge prologue
Mutator->>Converter: Invoke converter func with bridged args
Converter->>Converter: Emit tile.* compute call (no loads)
Converter-->>Mutator: Return result + converter prologue
Mutator->>Mutator: Merge bridge prologue + converter prologue + result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR refactors convert_tensor_to_tile_ops to centralize per-input memory space requirements in OpConversionRegistry, enabling framework-driven memory space bridging and replacing the special-cased tensor.slice → tensor.matmul look-ahead with a metadata-driven collector.
Changes:
- Extend
OpConversionRegistryto store converter functions alongside per-inputInputSpaceReqmetadata. - Replace
MatmulSlicePatternCollectorwith a generalConsumerSpaceCollectorthat reads input space requirements from the registry. - Add call-site auto-bridging in
TensorToTileMutator(currently implemented as TensorTypetile.loadinsertion) and simplify matmul converters to pure compute emitters.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
include/pypto/ir/transforms/op_conversion_registry.h |
Introduces InputSpaceReq/ConversionEntry and updates registration/lookup APIs. |
src/ir/transforms/op_conversion_registry.cpp |
Migrates registry storage to ConversionEntry and registers matmul input space requirements. |
src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp |
Adds consumer-space pre-scan, wires registry entries into conversion, and implements input bridging + consumer-driven slice load overrides. |
There was a problem hiding this comment.
Code Review
This pull request generalizes the mechanism for handling memory space requirements during tensor-to-tile conversion. It replaces the specialized MatmulSlicePatternCollector with a metadata-driven ConsumerSpaceCollector that utilizes InputSpaceReq definitions in the OpConversionRegistry. Key additions include the BridgeInputSpaces utility for automatic tile.load insertion and updates to tensor.matmul and tensor.matmul_acc to leverage this framework. Review feedback highlights a potential issue where skipping operations with any input requirements might lead to type mismatches for arguments without requirements, such as accumulators. Additionally, it is suggested to refine the 'first consumer wins' strategy to better handle conflicting memory space requirements by prioritizing specialized spaces over the default.
- Add GlobalVar guard in ConsumerSpaceCollector for consistency with TensorArgsInConvertedOpsCollector and TensorToTileMutator. - Prioritize non-Vec memory spaces in ConsumerSpaceCollector so a Vec requirement never shadows a later Mat/Left/Right/Acc/Bias requirement. - Update stale comment above tensor.matmul / tensor.matmul_acc converters to reflect the new framework auto-bridging responsibility split. - Phase-1 collector now excludes args per-index (not whole call) so matmul_acc's acc arg still receives a Phase-1 load when needed.
Summary
InputSpaceReqandConversionEntrytoOpConversionRegistry, allowing converters to declare per-input memory space requirements as metadataMatmulSlicePatternCollectorwith a generalConsumerSpaceCollectordriven by registered converter metadataTensorToTileMutatorthat automatically loadsTensorTypeargs to the required memory space before calling convertersLoadOperandToMathelper (no longer needed)Motivation
Memory space handling was spread across three independent mechanisms (Phase-1 entry loads,
MatmulSlicePatternCollector,LoadOperandToMat) with a hardcodedkSelfLoadingOpsexclusion list. Adding any new op that needs non-Vec inputs would require a new special-case collector. This refactoring unifies the approach: converters declare what they need, and the framework handles the rest.Closes #972
Testing
test_convert_tensor_to_tile_opstests pass (including slice→matmul patterns)