A set of steps to build HLS-based-accelerated PyTorch on PYNQ-support devices, especially on ARM-v7l architecture.
The project is mainly used as a study and verification platform for those who are working on AI hardware.
hw/ and backend/ folder gives a shabby implementation for how to register a torch backend in python in an easy way. It's very suitable for short-term verification for HLS algorithm.
PYNQtorch/
├── notebook/
| ├── backend_example/ # backend example
| | ├── example_EN.ipynb # backend example in English
| | └── example_CN.ipynb # backend example in Chinese
| └── hw_test/ # hardware test
│ └── matrix_mult.ipynb # Jupyter notebook
├── hw/
│ ├── kernel/ # HLS kernel
| | └── <kernel_name>/
│ │ ├── <kernel_name>.cpp # kernel source
│ │ ├── <kernel_name>_tb.cpp # kernel testbench
│ │ ├── <kernel_name>.hpp # kernel header
│ │ ├── description.json # kernel description
| │ ├── hls_config.cfg # HLS config
| │ └── run_hls.tcl # HLS Tcl
│ └── overlay/ # Vivado overlay
| └── <overlay_name>/
│ ├── *.bit # FPGA bitstream
│ ├── *.hwh # FPGA hardware handoff
│ └── run_vivado.tcl # synth & impl Tcl
├── sw/
| ├── pytorch (submodule) # PyTorch v2.8
| ├── audio (submodule) # TorchAudio v2.8
| └── vision (submodule) # TorchVision v0.23
├── backend/
| └── pytorch_zynq/ # custom backend example
| ├── __init__.py # entrance
| ├── device.py # attribute definition
| ├── hardware.py # PYNQ overlay layer
| ├── linear.py # custom linear binding
| └── ops.py # fitting sw operators to hw
├── scripts/
| └── environments.sh # set compilation flags
├── LICENSE
└── README.md
- PYNQ Z2 (typical)
- PYNQ v3.1.1
- ENVs (if you want to build by yourself)
- clone this repository to your board
- copy
backend/pytorch_zynq/to your workspace - import backend & play
Building from source may require some patience to succeed. However, hardware part should be easy so, start with hardware building.
cd hw/kernel/<kernel_name>
vitis-run --mode hls --tcl run_hls.tclcd hw/overlay/<overlay_name>
vivado -mode batch -source run_vivado.tclIt seems like building a PyTorch is somewhat a cake, but it will take you some time for sure!
Make sure you are using Linux. The compilation is not working on Windows.
-
Update your source
sudo apt update sudo apt upgrade
-
Install qemu-user-static for binfmt and other dependencies
sudo apt install qemu-user-static debootstrap -y
-
Make a chroot in ARM-v7l
Using Trixie is reasonable, I don't want to explain that.
sudo debootstrap --arch=armhf trixie /path/to/your/chroot
-
Run chroot
sudo mount -t sysfs sysfs /path/to/your/chroot/sys sudo mount -t proc proc /path/to/your/chroot/proc sudo mount --bind /dev /path/to/your/chroot/dev sudo chroot /path/to/your/chroot
-
Copy your sysroot
The
GLIBClib inTrixierelease is too new, and we also need some libraries in your board in case of something unsuitable. Then put all of them into a folder called "sysroot".You just need to download (or copy) the folders below:
/root/sysroot/ (path recommended) ├── lib/ # path /lib on your board └── usr/ ├── include/ # path /usr/include on your board └── lib/ # path /usr/lib on your board -
Patch your ARM-v7l container
Get all package you need and clone this repository.
Note that PyTorch recommends GCC 11. A newer compiler could lead to a series of bugs.
Replace your apt mirror release to
sid, or you will find gcc-11 is missing, lol.Get
pyenv, click here to find how to install.Install python 3.10.4 by
pyenvpyenv install 3.10.4 pyenv global 3.10.4
Feel free to use virtual environment.
Then, you need to init torch source code as there are lots of submodules.
$(this repository) git submodule init $(this repository) git submodule update --init --recursive
This may take a while, drink some tea then.
After you totally clone the source code, run
pip install -r requirements.txtto fit all dependencies the torch need, and some pip packages will take some time to compile from source. Kill time before that done.When all mentioned were taken, you are now running the last step before torch compilation.
$(this repository) source scripts/environments.sh
This will create a set of typical compiling environment flags, you need to make sure that some performance flags and dependency paths (e.g.
sysroot) suit your system, though. -
Compile torch
$(this repository) python sw/pytorch/setup.py bdist_wheel
This will create a binary wheel of torch after compilation, if there is no other errors. You may need to fix some cute errors by yourself.
-
Compile vision and audio library (optional)
The procedure is similar to torch compilation. Use
environments.shto help you.
We provide some Jupyter notebooks to help you get started. You can find them in the notebook/ folder.
Note that you must need to create your own workspace to run the notebooks, including copy backend/pytorch_zynq/ and hardware config files to the right place. It's not an open-box solution.
We know ZYNQ series vary in performance. On the typical ZYNQ 7020 (-1 speed), the hardware matrix multiplier gives a 22x acceleration rate on average. Feel free to explore the performance on different devices.
AGPL-3.0 license