[Feature] Tensor dump for runtime debugging and validation

### Summary

Add a tensor dump capability that captures intermediate tensor data (inputs before dispatch and outputs after completion) during runtime execution. This enables offline debugging, golden-value validation, and kernel correctness verification without modifying user kernels.

The feature spans three layers:
- **Platform layer**: common tensor dump interface, AICPU-side dump logic, and host-side collector for gathering dumped data
- **Runtime layer**: integration into `host_build_graph` runtime (with future support for `aicpu_build_graph` and `tensormap_and_ringbuffer`)
- **User interface**: `--dump-tensor` CLI flag in `run_example.py` and a dedicated example (`dump_tensor_example`) demonstrating usage

### Motivation / Use Case

When debugging kernel correctness issues or validating new orchestration flows, developers currently have no built-in way to inspect intermediate tensor values at each execution step. They must manually instrument kernel code or add ad-hoc print statements, which is error-prone and non-reproducible.

A first-class tensor dump feature allows:
- Capturing before-dispatch inputs and after-completion outputs per task, saved to disk as binary files
- Comparing dumped tensors against golden computations to pinpoint which kernel or step produces incorrect results
- Debugging without modifying kernel source — the dump is controlled entirely from the runtime/platform layer

### Proposed API / Behavior

- Enable via `--dump-tensor` flag on `run_example.py`
- Runtime sets `enable_dump` in kernel args; AICPU reads this flag and writes tensor data to a host-visible region
- Host-side `TensorDumpCollector` gathers and writes binary dump files organized by task ID and tensor index
- Output directory: `outputs/tensor_dump_<timestamp>/`

### Additional Context

Work in progress — currently implemented for `host_build_graph` runtime on the `a2a3` architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Tensor dump for runtime debugging and validation #506

Summary

Motivation / Use Case

Proposed API / Behavior

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Tensor dump for runtime debugging and validation #506

Description

Summary

Motivation / Use Case

Proposed API / Behavior

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions