Real-time Depth Perception and Object Recognition Using FPGA-Accelerated Stereo Vision and Quantized Neural Networks
Author: Samuel Brandon Smith (N11064196)
Supervisor: Dr. Jasmine Banks
Institution: Queensland University of Technology
Year: 2025
This Honours project implements a real-time computer vision system combining stereo depth perception and FPGA-accelerated object recognition. The system addresses the challenge of performing computationally intensive computer vision tasks in resource-constrained embedded environments.
The initial goal was to implement both disparity mapping and CNN-based object recognition entirely on an FPGA. However, resource constraints led to a hybrid architecture:
- RPI 5 handles stereo vision and disparity mapping
- PYNQ-Z1 FPGA performs accelerated CNN inference
- TCP communication enables real-time data exchange between devices
┌─────────────────┐ Object ID ( 0-9) ┌──────────────────┐
│ Raspberry Pi 5│<───────────────── │ PYNQ-Z1 FPGA │
│ │ TCP/Ethernet │ │
│ • Dual cameras │ 32x32 ROI data │ • CNN inference │
│ • Disparity map │──────────────────> │ • FINN compiler │
│ • Flask server │ │ • Object recog. │
│ • OpenCV │ │ │
└─────────────────┘ └──────────────────┘
│
V
Web Interface
(Live viewing)
Figure 1: RPI and PYNQ setup
| Component | Specification | Purpose |
|---|---|---|
| FPGA Board | PYNQ-Z1 or PYNQ-Z2 | CNN inference acceleration |
| Computing Platform | Raspberry Pi 5 | Stereo vision processing |
| Cameras | 2x RPI Camera modules | Stereo image capture |
| Connectivity | Ethernet cable | TCP communication |
- Real-time stereo vision with dual RPI cameras
- FPGA-accelerated CNN inference using quantized neural networks - uses binary neural network derived from ONNX files from FINN examples
- Web-based interface via Flask server for live monitoring
- Disparity mapping using OpenCV algorithms
- TCP communication for efficient data transfer
- Optimized performance through hardware-software co-design
- CIFAR-10: CNN training dataset for object classification
- Framework: Brevitas for quantization-aware training
- ETH3D Dataset: Stereo vision benchmarking
- Stereo EGO Motion Dataset: Real-world stereo sequences
Figure 2: Disparity mapping and object detection live
- Frame Rate (FPS): ~15 FPS
- System Latency: 62ms
- Classification Accuracy: 80.33%
- Detection Rate: 76.2%
- Distance Accuracy: 10.56% relative error
- Implemented hybrid FPGA-RPI architecture
- Real-time disparity mapping at ~15 FPS
- FPGA-accelerated object recognition
- Web-based monitoring interface
This project was completed as part of my Engineering Honours degree at Queensland University of Technology under the supervision of Dr. Jasmine Banks. The work explores the intersection of computer vision, FPGA acceleration, and embedded systems design.
If you use this work in your research, publications, or projects, please cite:
@misc{smith2025disparity,
author = {Samuel Brandon Smith},
title = {Real-time Depth Perception and Object Recognition Using FPGA-Accelerated Stereo Vision and Quantized Neural Networks},
year = {2025},
publisher = {QUT},
howpublished = {\url{https://github.com/SoftwareSystemSam/Disparity-Mapping-and-CNN-on-PYNQ-Z1}},
note = {QUT Honours Project}
}This project is provided "as is" without warranty of any kind. Use at your own risk. This was developed as an Engineering Honours project and may contain bugs or incomplete features.
- Author: Samuel Brandon Smith
- Student ID: N11064196
- Email: n11064196@qut.edu.au or georgesamsquo@hotmail.com
- Supervisor: Dr. Jasmine Banks
- Institution: Queensland University of Technology
