feat(ir,dsl): add pl.runtime_print for runtime tile/tensor debugging#857
feat(ir,dsl): add pl.runtime_print for runtime tile/tensor debugging#857Hzfengsy wants to merge 5 commits intohw-native-sys:mainfrom
Conversation
…w-native-sys#846) Add runtime_print DSL function that lowers to pto.tprint, enabling users to print tile and tensor contents for on-device debugging. - Register tile.runtime_print and tensor.runtime_print C++ IR ops - Add Python IR, DSL, and unified dispatch layers - Register tensor-to-tile conversion for InCore scope lowering - Add codegen mapping to pto.tprint for both ops - Add unit tests (10) and system tests (2) Closes hw-native-sys#846
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR adds a runtime-print debugging feature across the pipeline: new IR ops Changes
Sequence DiagramsequenceDiagram
participant DSL as DSL User\n(pl.runtime_print)
participant PyIR as Python IR Ops\n(_ir_ops.runtime_print)
participant IRReg as IR Registration\n(REGISTER_OP)
participant TypeConv as Type Conversion\n(OpConversionRegistry)
participant Backend as Backend\n(pto_ops_common)
participant PTO as PTO\n(pto.tprint)
DSL->>PyIR: call runtime_print(src)
PyIR->>IRReg: create Call to\ntensor.runtime_print or tile.runtime_print
IRReg->>IRReg: deduce type:\nvalidate arg type, return pass-through
TypeConv->>TypeConv: lower tensor.runtime_print\n→ tile.runtime_print (may insert tile.load)
Backend->>PTO: map tile.runtime_print\n→ emit pto.tprint
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a runtime_print utility for both tensors and tiles to facilitate debugging by emitting pto.tprint instructions. The implementation spans the C++ IR, Python bindings, and unified language operators, including support for type deduction and IR conversion. Comprehensive unit and runtime tests were added to verify the new functionality. Feedback was provided to improve test precision by catching a specific TypeError instead of a generic Exception in the unit tests.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/ir/op/tensor_ops/utility.cpp (1)
44-51: Consider adding.no_memory_spec()for consistency withtile.runtime_print.The
tile.runtime_printregistration insrc/ir/op/tile_ops/utility.cpp(line 48) includes.no_memory_spec(), but this tensor counterpart omits it. Sincetensor.runtime_printis similarly a pure side-effect debugging operation with no memory specification requirements, adding it would maintain consistency.♻️ Proposed fix
REGISTER_OP("tensor.runtime_print") .set_op_category("TensorOp") .set_description("Print tensor contents for debugging (generates pto.tprint)") .add_argument("tensor", "Input tensor to print (TensorType)") + .no_memory_spec() .f_deduce_type([](const std::vector<ExprPtr>& args, const std::vector<std::pair<std::string, std::any>>& kwargs) { return DeduceTensorPrintType(args, kwargs, "tensor.runtime_print"); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ir/op/tensor_ops/utility.cpp` around lines 44 - 51, The tensor.runtime_print op registration is missing .no_memory_spec(), making it inconsistent with tile.runtime_print; update the REGISTER_OP("tensor.runtime_print") chain to include .no_memory_spec() (alongside set_op_category, set_description, add_argument, and f_deduce_type) so the debug-only op declares no memory specification requirement—keep DeduceTensorPrintType(...) and the existing f_deduce_type call unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/ut/language/parser/test_runtime_print.py`:
- Around line 185-193: Update the test
test_runtime_print_requires_tile_or_tensor to assert the specific exception
TypeError rather than a bare Exception: replace pytest.raises(Exception) with
pytest.raises(TypeError) so the test verifies that pl.runtime_print(x) (in the
function defined inside the test) raises TypeError for non-Tensor/Tile inputs;
keep the same test body and references to pl.runtime_print and the inner
function to locate the change.
---
Nitpick comments:
In `@src/ir/op/tensor_ops/utility.cpp`:
- Around line 44-51: The tensor.runtime_print op registration is missing
.no_memory_spec(), making it inconsistent with tile.runtime_print; update the
REGISTER_OP("tensor.runtime_print") chain to include .no_memory_spec()
(alongside set_op_category, set_description, add_argument, and f_deduce_type) so
the debug-only op declares no memory specification requirement—keep
DeduceTensorPrintType(...) and the existing f_deduce_type call unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8e256409-ad0b-4377-be78-07056253e042
📒 Files selected for processing (13)
CMakeLists.txtpython/pypto/ir/op/tensor_ops.pypython/pypto/ir/op/tile_ops.pypython/pypto/language/__init__.pypython/pypto/language/op/tensor_ops.pypython/pypto/language/op/tile_ops.pypython/pypto/language/op/unified_ops.pysrc/backend/common/pto_ops_common.cppsrc/ir/op/tensor_ops/utility.cppsrc/ir/op/tile_ops/utility.cppsrc/ir/transforms/op_conversion_registry.cpptests/st/runtime/test_runtime_print.pytests/ut/language/parser/test_runtime_print.py
There was a problem hiding this comment.
Pull request overview
Adds a new debugging utility pl.runtime_print(tile_or_tensor) to the PyPTO DSL that lowers to pto.tprint, enabling runtime printing of tile/tensor contents without affecting program results.
Changes:
- Introduce new IR ops
tile.runtime_printandtensor.runtime_printwith pass-through type deduction. - Add DSL APIs for unified dispatch (
pl.runtime_print) plus explicit namespaces (pl.tile.runtime_print,pl.tensor.runtime_print). - Add backend codegen mappings to emit
pto.tprint, plus new unit + system tests.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/ut/language/parser/test_runtime_print.py | Parser/unit coverage for IR shape, printing roundtrip, and basic error handling. |
| tests/st/runtime/test_runtime_print.py | End-to-end runtime coverage to ensure pto.tprint emission doesn’t change results. |
| src/ir/transforms/op_conversion_registry.cpp | Adds tensor→tile op conversion entry for tensor.runtime_print. |
| src/ir/op/tile_ops/utility.cpp | Registers tile.runtime_print IR op and type deduction. |
| src/ir/op/tensor_ops/utility.cpp | Registers tensor.runtime_print IR op and type deduction. |
| src/backend/common/pto_ops_common.cpp | Maps the new ops to pto.tprint codegen. |
| python/pypto/language/op/unified_ops.py | Adds unified pl.runtime_print dispatch (Tensor vs Tile). |
| python/pypto/language/op/tile_ops.py | Adds pl.tile.runtime_print. |
| python/pypto/language/op/tensor_ops.py | Adds pl.tensor.runtime_print. |
| python/pypto/language/init.py | Re-exports runtime_print at pypto.language top level. |
| python/pypto/ir/op/tile_ops.py | Adds IR builder helper for tile.runtime_print. |
| python/pypto/ir/op/tensor_ops.py | Adds IR builder helper for tensor.runtime_print. |
| CMakeLists.txt | Includes the new C++ op source files in the build. |
- Replace RegisterSimple with RegisterCustom for tensor.runtime_print conversion: inserts tile.load prologue when the argument is still a TensorType (e.g. printing a function parameter before any explicit tile.load), matching the tensor.fillpad pattern - Use InvalidOperationError instead of bare Exception in the error-case test for precision
MakePrintCodegenPTO was emitting a placeholder type with wrong
separator ('|' instead of ':') and a dummy type string instead of
the actual tile buffer type from GetExprTypeAnnotation.
pto-isa guards TPRINT behind #ifdef _DEBUG. When ptoas-generated code contains TPRINT (from pto.tprint / runtime_print), insert #define _DEBUG before the pto-inst.hpp include so the macro is available.
_DEBUG enables cce::printf calls across all pto-isa headers, which don't compile in simulation. Instead, inject a no-op TPRINT template after the include, guarded by #ifndef _DEBUG so the real implementation is used on hardware.
Summary
pl.runtime_print(tile_or_tensor)DSL function that lowers topto.tprint, enabling runtime debugging of tile and tensor contents on devicepl.runtime_print(tile)/pl.tile.runtime_print(tile)) and tensors (pl.runtime_print(tensor)/pl.tensor.runtime_print(tensor)) via unified dispatchtile.runtime_printandtensor.runtime_printC++ IR ops with pass-through type deductiontensor.runtime_printlowers correctly in InCore scopepto.tprintfor both opsTest plan
Closes #846