Skip to content

feat(ir,dsl): add pl.runtime_print for runtime tile/tensor debugging#857

Draft
Hzfengsy wants to merge 5 commits intohw-native-sys:mainfrom
Hzfengsy:feat/runtime-print
Draft

feat(ir,dsl): add pl.runtime_print for runtime tile/tensor debugging#857
Hzfengsy wants to merge 5 commits intohw-native-sys:mainfrom
Hzfengsy:feat/runtime-print

Conversation

@Hzfengsy
Copy link
Copy Markdown
Member

@Hzfengsy Hzfengsy commented Apr 2, 2026

Summary

  • Add pl.runtime_print(tile_or_tensor) DSL function that lowers to pto.tprint, enabling runtime debugging of tile and tensor contents on device
  • Supports both tiles (pl.runtime_print(tile) / pl.tile.runtime_print(tile)) and tensors (pl.runtime_print(tensor) / pl.tensor.runtime_print(tensor)) via unified dispatch
  • Register tile.runtime_print and tensor.runtime_print C++ IR ops with pass-through type deduction
  • Register tensor-to-tile conversion so tensor.runtime_print lowers correctly in InCore scope
  • Add codegen mapping to pto.tprint for both ops

Test plan

  • 10 unit tests: parser, roundtrip, type preservation, namespace access, error cases
  • 2 system tests: tile print and tensor print with PTOTestCase harness
  • Full test suite: 3316 passed, 0 failed
  • clang-tidy: clean
  • pyright: clean

Closes #846

…w-native-sys#846)

Add runtime_print DSL function that lowers to pto.tprint, enabling
users to print tile and tensor contents for on-device debugging.

- Register tile.runtime_print and tensor.runtime_print C++ IR ops
- Add Python IR, DSL, and unified dispatch layers
- Register tensor-to-tile conversion for InCore scope lowering
- Add codegen mapping to pto.tprint for both ops
- Add unit tests (10) and system tests (2)

Closes hw-native-sys#846
Copilot AI review requested due to automatic review settings April 2, 2026 10:56
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR adds a runtime-print debugging feature across the pipeline: new IR ops tile.runtime_print and tensor.runtime_print (C++), Python IR wrappers and DSL helpers (pl.runtime_print), conversion lowering from tensor→tile, backend mapping to the existing print codegen, build updates, and unit/system tests.

Changes

Cohort / File(s) Summary
Build Configuration
CMakeLists.txt
Added src/ir/op/tile_ops/utility.cpp and src/ir/op/tensor_ops/utility.cpp to PYTO_SOURCES.
IR Op Implementations
src/ir/op/tile_ops/utility.cpp, src/ir/op/tensor_ops/utility.cpp
New IR ops tile.runtime_print and tensor.runtime_print with type-deduction helpers that validate single-arg and return the input Tile/Tensor type (pass-through).
IR → Tile Conversion
src/ir/transforms/op_conversion_registry.cpp
Added converter for tensor.runtime_print: if input is TileType, call tile.runtime_print; if TensorType, emit a tile.load prologue then tile.runtime_print.
Backend Mapping
src/backend/common/pto_ops_common.cpp
Replaced former tile.print registration with registrations mapping tile.runtime_print and tensor.runtime_print to the shared print codegen factory (pto.tprint).
Python IR Wrappers
python/pypto/ir/op/tile_ops.py, python/pypto/ir/op/tensor_ops.py
Added runtime_print(expr, span=None) -> Call helpers that emit tile.runtime_print / tensor.runtime_print IR calls.
DSL (Type-Specific)
python/pypto/language/op/tile_ops.py, python/pypto/language/op/tensor_ops.py
Added runtime_print(tile/tensor) -> None DSL statement helpers (unwrap and forward to IR ops); updated __all__.
DSL (Unified) & Public API
python/pypto/language/op/unified_ops.py, python/pypto/language/__init__.py
Added unified `runtime_print(src: Tensor
Unit Tests
tests/ut/language/parser/test_runtime_print.py
New unit tests verifying IR emission, call operator names/types, statement insertion, round-trip source parsing, and error on scalar inputs.
System Tests
tests/st/runtime/test_runtime_print.py
End-to-end tests for tile and tensor usage ensuring runtime_print side-effects do not change numeric results on 128×128 FP32 inputs.

Sequence Diagram

sequenceDiagram
    participant DSL as DSL User\n(pl.runtime_print)
    participant PyIR as Python IR Ops\n(_ir_ops.runtime_print)
    participant IRReg as IR Registration\n(REGISTER_OP)
    participant TypeConv as Type Conversion\n(OpConversionRegistry)
    participant Backend as Backend\n(pto_ops_common)
    participant PTO as PTO\n(pto.tprint)

    DSL->>PyIR: call runtime_print(src)
    PyIR->>IRReg: create Call to\ntensor.runtime_print or tile.runtime_print
    IRReg->>IRReg: deduce type:\nvalidate arg type, return pass-through
    TypeConv->>TypeConv: lower tensor.runtime_print\n→ tile.runtime_print (may insert tile.load)
    Backend->>PTO: map tile.runtime_print\n→ emit pto.tprint
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly Related PRs

Poem

🐰 I hopped through code to add a print delight,

Tiles and tensors now speak in the night,
From DSL to PTO they follow the trail,
A rabbit’s debug hop — no detail will fail!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.02% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the main change: adding a runtime_print function for debugging tiles/tensors via pl.runtime_print.
Description check ✅ Passed The description provides relevant context about the feature addition, implementation approach, and testing, all related to the changeset.
Linked Issues check ✅ Passed The PR fully implements the requirements from issue #846: adds a DSL-level print helper (pl.runtime_print instead of pl.print) that lowers to pto.tprint for tile/tensor debugging.
Out of Scope Changes check ✅ Passed All changes directly support the objective of adding pl.runtime_print for runtime debugging; no unrelated modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a runtime_print utility for both tensors and tiles to facilitate debugging by emitting pto.tprint instructions. The implementation spans the C++ IR, Python bindings, and unified language operators, including support for type deduction and IR conversion. Comprehensive unit and runtime tests were added to verify the new functionality. Feedback was provided to improve test precision by catching a specific TypeError instead of a generic Exception in the unit tests.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/ir/op/tensor_ops/utility.cpp (1)

44-51: Consider adding .no_memory_spec() for consistency with tile.runtime_print.

The tile.runtime_print registration in src/ir/op/tile_ops/utility.cpp (line 48) includes .no_memory_spec(), but this tensor counterpart omits it. Since tensor.runtime_print is similarly a pure side-effect debugging operation with no memory specification requirements, adding it would maintain consistency.

♻️ Proposed fix
 REGISTER_OP("tensor.runtime_print")
     .set_op_category("TensorOp")
     .set_description("Print tensor contents for debugging (generates pto.tprint)")
     .add_argument("tensor", "Input tensor to print (TensorType)")
+    .no_memory_spec()
     .f_deduce_type([](const std::vector<ExprPtr>& args,
                       const std::vector<std::pair<std::string, std::any>>& kwargs) {
       return DeduceTensorPrintType(args, kwargs, "tensor.runtime_print");
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/op/tensor_ops/utility.cpp` around lines 44 - 51, The
tensor.runtime_print op registration is missing .no_memory_spec(), making it
inconsistent with tile.runtime_print; update the
REGISTER_OP("tensor.runtime_print") chain to include .no_memory_spec()
(alongside set_op_category, set_description, add_argument, and f_deduce_type) so
the debug-only op declares no memory specification requirement—keep
DeduceTensorPrintType(...) and the existing f_deduce_type call unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/ut/language/parser/test_runtime_print.py`:
- Around line 185-193: Update the test
test_runtime_print_requires_tile_or_tensor to assert the specific exception
TypeError rather than a bare Exception: replace pytest.raises(Exception) with
pytest.raises(TypeError) so the test verifies that pl.runtime_print(x) (in the
function defined inside the test) raises TypeError for non-Tensor/Tile inputs;
keep the same test body and references to pl.runtime_print and the inner
function to locate the change.

---

Nitpick comments:
In `@src/ir/op/tensor_ops/utility.cpp`:
- Around line 44-51: The tensor.runtime_print op registration is missing
.no_memory_spec(), making it inconsistent with tile.runtime_print; update the
REGISTER_OP("tensor.runtime_print") chain to include .no_memory_spec()
(alongside set_op_category, set_description, add_argument, and f_deduce_type) so
the debug-only op declares no memory specification requirement—keep
DeduceTensorPrintType(...) and the existing f_deduce_type call unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8e256409-ad0b-4377-be78-07056253e042

📥 Commits

Reviewing files that changed from the base of the PR and between d765fc0 and 98afb47.

📒 Files selected for processing (13)
  • CMakeLists.txt
  • python/pypto/ir/op/tensor_ops.py
  • python/pypto/ir/op/tile_ops.py
  • python/pypto/language/__init__.py
  • python/pypto/language/op/tensor_ops.py
  • python/pypto/language/op/tile_ops.py
  • python/pypto/language/op/unified_ops.py
  • src/backend/common/pto_ops_common.cpp
  • src/ir/op/tensor_ops/utility.cpp
  • src/ir/op/tile_ops/utility.cpp
  • src/ir/transforms/op_conversion_registry.cpp
  • tests/st/runtime/test_runtime_print.py
  • tests/ut/language/parser/test_runtime_print.py

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new debugging utility pl.runtime_print(tile_or_tensor) to the PyPTO DSL that lowers to pto.tprint, enabling runtime printing of tile/tensor contents without affecting program results.

Changes:

  • Introduce new IR ops tile.runtime_print and tensor.runtime_print with pass-through type deduction.
  • Add DSL APIs for unified dispatch (pl.runtime_print) plus explicit namespaces (pl.tile.runtime_print, pl.tensor.runtime_print).
  • Add backend codegen mappings to emit pto.tprint, plus new unit + system tests.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/ut/language/parser/test_runtime_print.py Parser/unit coverage for IR shape, printing roundtrip, and basic error handling.
tests/st/runtime/test_runtime_print.py End-to-end runtime coverage to ensure pto.tprint emission doesn’t change results.
src/ir/transforms/op_conversion_registry.cpp Adds tensor→tile op conversion entry for tensor.runtime_print.
src/ir/op/tile_ops/utility.cpp Registers tile.runtime_print IR op and type deduction.
src/ir/op/tensor_ops/utility.cpp Registers tensor.runtime_print IR op and type deduction.
src/backend/common/pto_ops_common.cpp Maps the new ops to pto.tprint codegen.
python/pypto/language/op/unified_ops.py Adds unified pl.runtime_print dispatch (Tensor vs Tile).
python/pypto/language/op/tile_ops.py Adds pl.tile.runtime_print.
python/pypto/language/op/tensor_ops.py Adds pl.tensor.runtime_print.
python/pypto/language/init.py Re-exports runtime_print at pypto.language top level.
python/pypto/ir/op/tile_ops.py Adds IR builder helper for tile.runtime_print.
python/pypto/ir/op/tensor_ops.py Adds IR builder helper for tensor.runtime_print.
CMakeLists.txt Includes the new C++ op source files in the build.

Hzfengsy added 4 commits April 3, 2026 16:23
- Replace RegisterSimple with RegisterCustom for tensor.runtime_print
  conversion: inserts tile.load prologue when the argument is still a
  TensorType (e.g. printing a function parameter before any explicit
  tile.load), matching the tensor.fillpad pattern
- Use InvalidOperationError instead of bare Exception in the
  error-case test for precision
MakePrintCodegenPTO was emitting a placeholder type with wrong
separator ('|' instead of ':') and a dummy type string instead of
the actual tile buffer type from GetExprTypeAnnotation.
pto-isa guards TPRINT behind #ifdef _DEBUG. When ptoas-generated code
contains TPRINT (from pto.tprint / runtime_print), insert #define _DEBUG
before the pto-inst.hpp include so the macro is available.
_DEBUG enables cce::printf calls across all pto-isa headers, which
don't compile in simulation.  Instead, inject a no-op TPRINT template
after the include, guarded by #ifndef _DEBUG so the real implementation
is used on hardware.
@Hzfengsy Hzfengsy marked this pull request as draft April 6, 2026 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add pl.print DSL function to generate pto.tprint

2 participants