Skip to content

sjduan/simpler

 
 

Repository files navigation

PTO Runtime - Task Runtime Execution Framework

Modular runtime for building and executing task dependency graphs on Ascend devices with coordinated AICPU and AICore execution. Three independently compiled programs (Host .so, AICPU .so, AICore .o) work together through clearly defined APIs.

Quick Start

# Clone the repository
git clone <repo-url>
cd simpler

# Run the vector example (simulation, no hardware required)
python examples/scripts/run_example.py \
  -k examples/a2a3/host_build_graph/vector_example/kernels \
  -g examples/a2a3/host_build_graph/vector_example/golden.py \
  -p a2a3sim

PTO ISA headers are automatically cloned on first run. See Getting Started for manual setup and troubleshooting.

Platforms

Platform Description Requirements
a2a3 Real Ascend A2/A3 hardware CANN toolkit (ccec, aarch64 cross-compiler)
a2a3sim Thread-based A2/A3 simulation gcc/g++ only (no Ascend SDK needed)
a5 Real Ascend A5 hardware CANN toolkit (ccec, aarch64 cross-compiler)
a5sim Thread-based A5 simulation gcc/g++ only (no Ascend SDK needed)

Runtime Variants

Three runtimes under src/{arch}/runtime/, each with a different graph-building strategy:

Runtime Graph built on Use case
host_build_graph Host CPU Development, debugging
aicpu_build_graph AICPU (device) Reduced host-device transfer
tensormap_and_ringbuffer AICPU (device) Production workloads

See runtime docs per arch: a2a3, a5.

Testing

# Simulation tests (no hardware)
./ci.sh -p a2a3sim

# Hardware tests (requires Ascend device)
./ci.sh -p a2a3 -d 4-7 --parallel

# Python unit tests
pytest tests -m "not requires_hardware" -v

# C++ unit tests
cmake -B tests/ut/cpp/build -S tests/ut/cpp && cmake --build tests/ut/cpp/build && ctest --test-dir tests/ut/cpp/build --output-on-failure

See Testing Guide for details.

Environment Setup

source /usr/local/Ascend/ascend-toolkit/latest/bin/setenv.bash
export ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest

Documentation

Document Description
Chip-Level Architecture L2 single-chip: three-program model (host/AICPU/AICore), API layers, handshake protocol
Distributed Level Runtime L0–L6 level model, component composition (Orchestrator / Scheduler / Worker)
Task Flow End-to-end data flow: Callable / TaskArgs / CallConfig handles, IWorker interface
Orchestrator DAG submission internals: submit flow, TensorMap, Scope, Ring, task state machine
Scheduler DAG dispatch internals: wiring/ready/completion queues, dispatch loop
Worker Manager Worker pool, WorkerThread, THREAD/PROCESS modes, fork + mailbox mechanics
Roadmap Hierarchical-runtime refactor — what has landed and what is still in flight
Getting Started Setup, prerequisites, build process, configuration
Developer Guide Directory structure, role ownership, conventions
Testing Guide CI pipeline, test types, writing new tests

Per-arch docs

Document a2a3 arch a5 arch
Runtimes a2a3/docs/runtimes.md a5/docs/runtimes.md
Platform a2a3/docs/platform.md a5/docs/platform.md

License

This project is licensed under the CANN Open Software License Agreement Version 2.0. See the LICENSE file for the full license text.

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C++ 75.9%
  • Python 15.6%
  • C 5.5%
  • CMake 1.7%
  • Shell 1.3%