Skip to content

[Feature] Tensor dump for runtime debugging and validation #506

@ChaoZheng109

Description

@ChaoZheng109

Summary

Add a tensor dump capability that captures intermediate tensor data (inputs before dispatch and outputs after completion) during runtime execution. This enables offline debugging, golden-value validation, and kernel correctness verification without modifying user kernels.

The feature spans three layers:

  • Platform layer: common tensor dump interface, AICPU-side dump logic, and host-side collector for gathering dumped data
  • Runtime layer: integration into host_build_graph runtime (with future support for aicpu_build_graph and tensormap_and_ringbuffer)
  • User interface: --dump-tensor CLI flag in run_example.py and a dedicated example (dump_tensor_example) demonstrating usage

Motivation / Use Case

When debugging kernel correctness issues or validating new orchestration flows, developers currently have no built-in way to inspect intermediate tensor values at each execution step. They must manually instrument kernel code or add ad-hoc print statements, which is error-prone and non-reproducible.

A first-class tensor dump feature allows:

  • Capturing before-dispatch inputs and after-completion outputs per task, saved to disk as binary files
  • Comparing dumped tensors against golden computations to pinpoint which kernel or step produces incorrect results
  • Debugging without modifying kernel source — the dump is controlled entirely from the runtime/platform layer

Proposed API / Behavior

  • Enable via --dump-tensor flag on run_example.py
  • Runtime sets enable_dump in kernel args; AICPU reads this flag and writes tensor data to a host-visible region
  • Host-side TensorDumpCollector gathers and writes binary dump files organized by task ID and tensor index
  • Output directory: outputs/tensor_dump_<timestamp>/

Additional Context

Work in progress — currently implemented for host_build_graph runtime on the a2a3 architecture.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions