Retail checkout systems have remained largely unchanged for decades. Most supermarkets still rely on barcode-based checkout systems, where every product must be manually scanned.
Although barcode systems are reliable, they introduce several operational limitations.
In traditional checkout systems:
• Each item must be individually scanned
• The cashier must manually locate the barcode
• Damaged or unreadable barcodes cause delays
• Human error can occur during product identification
When customers purchase many products, the checkout process becomes slow and inefficient.
With the rapid development of Artificial Intelligence and Computer Vision, it has become possible to automate product identification using visual recognition instead of barcode scanning.
This project explores a Vision-Based Smart Cashier System, where products are automatically recognized using a camera and deep learning models.
The system demonstrates how AI, Web Systems, and Embedded Hardware can be integrated to create an intelligent retail checkout solution.
Traditional barcode checkout systems suffer from multiple limitations.
Each product must be scanned individually.
Example:
20 products → 20 barcode scans
This increases customer waiting time.
The process depends heavily on the cashier.
Human errors may include:
• scanning the wrong item
• scanning an item twice
• failing to detect a damaged barcode
Barcode systems cannot operate autonomously.
They require:
• manual scanning
• specialized hardware
The proposed solution is a Vision-Based Smart Cashier System.
Instead of scanning barcodes, the system identifies products visually using Artificial Intelligence.
The user simply places a product in front of a camera.
The system automatically:
- Captures the product image
- Detects the object using YOLOv8
- Verifies the product using feature embeddings
- Adds the product to the invoice
- Displays the result in the web interface
- Sends feedback signals to hardware devices
The following image shows the actual physical prototype of the Vision-Based Smart Cashier device.
This device was built as a working demonstration of the smart cashier system operating in a real environment.
The prototype includes:
• ESP32 microcontroller
• LCD screen for product information
• Green LED indicator (successful recognition)
• Red LED indicator (unknown product)
• Buzzer for sound feedback
• Camera mounted above the device for product capture
The camera captures the product placed in front of the system.
The image is processed by the AI backend which performs:
- Object detection using YOLOv8
- Product verification using feature embeddings
- Invoice generation
- Hardware feedback through ESP32
The system integrates four major subsystems.
Camera
↓
Frontend Interface
↓
FastAPI Backend
↓
YOLOv8 Detection
↓
Embedding Verification
↓
Invoice Generation
↓
ESP32 Hardware Feedback
┌─────────────────────────┐
│ Camera │
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ Web Frontend │
│ HTML + JavaScript UI │
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ FastAPI │
│ Backend Server │
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ YOLOv8 │
│ Object Detection AI │
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ MobileNet Embedding │
│ Feature Verification │
└────────────┬────────────┘
│
┌──────────┴──────────┐
▼ ▼
Invoice Generation ESP32 Hardware
│
LCD + LED + Buzzer
The AI pipeline contains two stages.
Camera Frame
↓
Image Decoding
↓
YOLOv8 Detection
↓
Bounding Box Extraction
↓
Product Crop
↓
MobileNet Feature Extraction
↓
Cosine Similarity Comparison
↓
Product Label Confirmation
This two-stage approach increases detection reliability.
The embedding verification step compares feature vectors using Cosine Similarity.
The similarity between two vectors is defined as:
similarity = (A · B) / (||A|| × ||B||)
Where:
• A = detected product embedding
• B = reference product embedding
The result ranges between:
-1 → completely different
1 → identical
The system accepts a product if:
similarity ≥ 0.88
YOLOv8
MobileNetV3
PyTorch
Python
FastAPI
OpenCV
NumPy
HTML
CSS
JavaScript
Web Camera API
ESP32
LCD Display (I2C)
LED Indicators
Buzzer
smart_cashier_local2
│
├── api
│ └── main.py
│
├── dataset
│ ├── train
│ ├── valid
│ ├── test
│ └── data.yaml
│
├── docs
│ ├── system_interface1.png
│ ├── system_interface2.png
│ ├── object_detection.png
│ ├── product_added.png
│ ├── hardware_esp32_setup.png
│ └── real_hardware_device.png
│
├── frontend
│ ├── index.html
│ └── images
│
├── model
│ └── best.pt
│
├── product_images
│
├── runs
│
├── tools
│ └── cloudflared.exe
│
├── train_yolov8.py
├── requirements.txt
└── yolov8n.pt
This file is the core of the entire system.
It performs the following tasks:
• loads the YOLO model
• processes camera frames
• runs AI inference
• verifies embeddings
• generates invoices
• communicates with ESP32
The frontend is responsible for:
• displaying the user interface
• activating the camera
• sending frames to the server
• displaying detection results
• managing the invoice
The interface is divided into three sections.
Contains system controls:
• product detection mode
• product management
• invoice reset
Displays detected products in the invoice.
Each product entry shows:
• product image
• product name
• price
• quantity
Displays the live camera feed.
Detected products are highlighted with bounding boxes.
Example:
Noodles (87%)
The system integrates an ESP32 microcontroller connected via USB serial communication.
Hardware components:
• ESP32
• LCD screen
• Green LED
• Red LED
• Buzzer
Backend sends:
OK:ProductName:Price
Example:
OK:Noodles:150
Hardware response:
• LCD displays product name
• LCD displays product price
• Green LED turns ON
• Short confirmation beep
Backend sends:
ERR
Hardware response:
• LCD displays "Unknown Item"
• Red LED turns ON
• Long warning beep
Start the backend server:
uvicorn api.main:app --host 0.0.0.0 --port 8000
Open the system locally:
Allow camera access.
Place products in front of the camera.
Press Enter to confirm product addition.
To allow others to access the system remotely, the project uses Cloudflare Tunnel.
Run this command in another terminal:
.\tools\cloudflared.exe tunnel --url http://localhost:8000
Cloudflare will generate a public URL such as:
https://random-name.trycloudflare.com
Anyone with this link can access the Smart Cashier system remotely.
The dataset used to train the model follows the YOLO format.
dataset
├── train
├── valid
└── test
Each image has a corresponding annotation file containing bounding box coordinates.
The model was trained using YOLOv8.
Training script:
train_yolov8.py
After training, the best model is saved as:
model/best.pt
Possible improvements include:
• multi-camera checkout systems
• cloud-based product databases
• automatic inventory management
• edge AI deployment
• fully cashier-less stores
This project demonstrates the integration of Artificial Intelligence, Web Technologies, and Embedded Systems.
By replacing traditional barcode scanners with vision-based product recognition, the system enables faster, smarter, and more automated retail experiences.
Engineers: Raidan Al-khateeb Mohammed Al-wosabi
Artificial Intelligence Engineering Student
Focus Areas:
Computer Vision
Deep Learning
Embedded AI Systems
Smart Retail Technologies





