Real-time Depth Perception and Object Recognition Using FPGA-Accelerated Stereo Vision and Quantized Neural Networks

Author: Samuel Brandon Smith (N11064196)
Supervisor: Dr. Jasmine Banks
Institution: Queensland University of Technology
Year: 2025

Overview

This Honours project implements a real-time computer vision system combining stereo depth perception and FPGA-accelerated object recognition. The system addresses the challenge of performing computationally intensive computer vision tasks in resource-constrained embedded environments.

Project Evolution

The initial goal was to implement both disparity mapping and CNN-based object recognition entirely on an FPGA. However, resource constraints led to a hybrid architecture:

RPI 5 handles stereo vision and disparity mapping
PYNQ-Z1 FPGA performs accelerated CNN inference
TCP communication enables real-time data exchange between devices

System Architecture

┌─────────────────┐   Object ID ( 0-9) ┌──────────────────┐
│   Raspberry Pi 5│<─────────────────  │    PYNQ-Z1 FPGA  │
│                 │    TCP/Ethernet    │                  │
│ • Dual cameras  │   32x32 ROI data   │ • CNN inference  │
│ • Disparity map │──────────────────> │ • FINN compiler  │
│ • Flask server  │                    │ • Object recog.  │
│ • OpenCV        │                    │                  │
└─────────────────┘                    └──────────────────┘
        │
        V
   Web Interface
   (Live viewing)

Figure 1: RPI and PYNQ setup

Hardware Requirements

Component	Specification	Purpose
FPGA Board	PYNQ-Z1 or PYNQ-Z2	CNN inference acceleration
Computing Platform	Raspberry Pi 5	Stereo vision processing
Cameras	2x RPI Camera modules	Stereo image capture
Connectivity	Ethernet cable	TCP communication

Key Features

Real-time stereo vision with dual RPI cameras
FPGA-accelerated CNN inference using quantized neural networks - uses binary neural network derived from ONNX files from FINN examples
Web-based interface via Flask server for live monitoring
Disparity mapping using OpenCV algorithms
TCP communication for efficient data transfer
Optimized performance through hardware-software co-design

Datasets

Training Data

CIFAR-10: CNN training dataset for object classification
Framework: Brevitas for quantization-aware training

Testing Data - inital stages of project

ETH3D Dataset: Stereo vision benchmarking
Stereo EGO Motion Dataset: Real-world stereo sequences
- Dataset Link

Figure 2: Disparity mapping and object detection live

Results

Performance Metrics

Frame Rate (FPS): ~15 FPS
System Latency: 62ms
Classification Accuracy: 80.33%
Detection Rate: 76.2%
Distance Accuracy: 10.56% relative error

Key Achievements

Implemented hybrid FPGA-RPI architecture
Real-time disparity mapping at ~15 FPS
FPGA-accelerated object recognition
Web-based monitoring interface

Academic Context

This project was completed as part of my Engineering Honours degree at Queensland University of Technology under the supervision of Dr. Jasmine Banks. The work explores the intersection of computer vision, FPGA acceleration, and embedded systems design.

Citation

If you use this work in your research, publications, or projects, please cite:

@misc{smith2025disparity,
    author = {Samuel Brandon Smith},
    title = {Real-time Depth Perception and Object Recognition Using FPGA-Accelerated Stereo Vision and Quantized Neural Networks},
    year = {2025},
    publisher = {QUT},
    howpublished = {\url{https://github.com/SoftwareSystemSam/Disparity-Mapping-and-CNN-on-PYNQ-Z1}},
    note = {QUT Honours Project}
}

Disclaimer

This project is provided "as is" without warranty of any kind. Use at your own risk. This was developed as an Engineering Honours project and may contain bugs or incomplete features.

Contact

Author: Samuel Brandon Smith
Student ID: N11064196
Email: n11064196@qut.edu.au or georgesamsquo@hotmail.com
Supervisor: Dr. Jasmine Banks
Institution: Queensland University of Technology

⭐ If this project helped you, please give it a star! ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
FPGA_files		FPGA_files
RPI_files		RPI_files
old_test_code		old_test_code
vivado_files		vivado_files
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-time Depth Perception and Object Recognition Using FPGA-Accelerated Stereo Vision and Quantized Neural Networks

Overview

Project Evolution

System Architecture

Hardware Requirements

Key Features

Datasets

Training Data

Testing Data - inital stages of project

Results

Performance Metrics

Key Achievements

Academic Context

Citation

Disclaimer

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real-time Depth Perception and Object Recognition Using FPGA-Accelerated Stereo Vision and Quantized Neural Networks

Overview

Project Evolution

System Architecture

Hardware Requirements

Key Features

Datasets

Training Data

Testing Data - inital stages of project

Results

Performance Metrics

Key Achievements

Academic Context

Citation

Disclaimer

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages