Control a robot with nothing but your hand β real-time gesture recognition over serial
- Overview
- Architecture
- Development Phases
- Capstone Highlight
- Getting Started
- Usage
- Testing
- Engineering Notes
- Roadmap
- Contributing
- License
A real-time hand gesture recognition system that translates five distinct hand poses into directional commands (Forward, Backward, Left, Right, Stop) and transmits them to an Arduino-driven robot over a serial connection. MediaPipe's Hand Landmarker model performs 21-point hand skeleton detection per frame, while OpenCV handles video capture and on-screen landmark visualization. The entire pipeline runs in a single Python process with sub-100ms latency from gesture to motor command.
- Real-time 21-point hand landmark detection via MediaPipe Tasks API
- Five directional gestures: Forward, Backward, Left, Right, Stop
- Live serial communication to Arduino over USB (9600 baud)
- Automatic model download on first run β zero manual setup
- On-screen landmark overlay with OpenCV rendering
- Graceful fallback when Arduino is disconnected
- Single-file, dependency-light architecture
Built with: Python 3.10+, OpenCV 4.x, MediaPipe 0.10, PySerial, Arduino (Serial).
π Repository structure
Hero_project/
βββ Hand_Gesture.py # Core application β capture, detect, command
βββ hand_landmarker.task # MediaPipe hand landmarker model (auto-downloaded)
βββ pip_download_cache/ # Cached wheel for offline installs
β βββ mediapipe-0.10.30-py3-none-win_amd64.whl
βββ README.md
System data flow:
ββββββββββββ frames ββββββββββββββ landmarks ββββββββββββββββ
β Webcam β βββββββββββΊ β OpenCV β βββββββββββββΊ β MediaPipe β
β (cv2) β β Capture β β Landmarker β
ββββββββββββ ββββββββββββββ ββββββββ¬ββββββββ
β
gesture classification
β
βΌ
ββββββββββββββ serial ββββββββββββββββ
β Arduino β βββββββββββ β Gesture β β
β Motors β F/B/L/R/S β Command Map β
ββββββββββββββ ββββββββββββββββ
| Phase | Goal | Status | Outcome |
|---|---|---|---|
| v0.1 | Basic hand detection with OpenCV + MediaPipe | β Done | Landmark overlay rendering validated |
| v0.2 | Gesture classification from landmark geometry | β Done | Five gestures reliably distinguished |
| v0.3 | Arduino serial integration for motor control | β Done | End-to-end gesture β robot movement working |
| v1.0 | Stable single-file controller with auto-download | β Done | Production-ready pipeline shipped |
Note: Status indicators follow the convention: β Complete Β· π In Progress Β· π Planned.
- Sub-100ms gesture-to-command latency β real-time enough for responsive robot control
- Zero-config model management β the MediaPipe
.taskmodel auto-downloads on first execution - Five deterministic gesture mappings derived from finger-tip vs. knuckle Y-coordinate comparisons and thumb-index pinch distance
- Graceful degradation β the system logs a warning and continues camera-only mode if no Arduino is detected
- Single 132-line Python file β the entire pipeline fits in one readable, auditable script
- Python β₯ 3.10
- pip (bundled with Python)
- USB webcam (built-in or external)
- Arduino board connected via USB (COM port)
- Arduino flashed with a sketch that reads serial bytes (
F,B,L,R,S) and drives motors accordingly
# 1. Clone the repository
git clone https://github.com/Relvixx/Hand_Gesture_Robot.git
cd Hand_Gesture_Robot
# 2. Create and activate a virtual environment
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS / Linux
# 3. Install dependencies
pip install opencv-python mediapipe pyserial
# 4. (Optional) Install from cached wheel for offline use
pip install pip_download_cache/mediapipe-0.10.30-py3-none-win_amd64.whl| Variable | Description | Required |
|---|---|---|
ARDUINO_PORT |
Serial port for the Arduino (defaults to COM3 in code) |
No β hardcoded default |
Modify the
serial.Serial('COM3', 9600)line inHand_Gesture.pyto match your system's port if it differs fromCOM3.
# Run the gesture controller
python Hand_Gesture.py
# First run downloads the MediaPipe model (~7.5 MB) automatically
# A window titled "AI Hand Gesture Control" opens with the camera feed
# Gesture commands:
# β All fingers up β Forward (sends 'F')
# β All fingers down (fist) β Stop (sends 'S')
# βοΈ Index finger only β Backward (sends 'B')
# βοΈ Index + middle up β Left (sends 'L')
# π€ Thumb-index pinch + 3 up β Right (sends 'R')
# Press 'q' to quitTip
Keep your hand 30β60 cm from the camera with a clean, high-contrast background for the most reliable detection. Ensure adequate lighting β MediaPipe's confidence threshold is set to 0.5.
# Verify the pipeline without an Arduino attached
python Hand_Gesture.py
# The script prints gesture labels to stdout and displays landmarks on screen
# Arduino commands are silently skipped when no board is connectedNote
No formal test suite exists yet. Validation is manual: verify that the correct gesture label prints to the console for each hand pose, and confirm serial bytes arrive on the Arduino's serial monitor. See the Roadmap for planned pytest integration.
Note
Gesture classification uses raw landmark geometry, not a trained classifier. Each gesture is identified by comparing the Y-coordinates of fingertip landmarks (indices 8, 12, 16, 20) against their corresponding knuckle landmarks (indices 5, 9, 13, 17). This deterministic approach avoids training overhead and works reliably for five gestures but does not generalize to arbitrary hand signals without additional logic.
Important
The serial port is hardcoded to COM3 at 9600 baud. If your Arduino enumerates on a different port (common on Linux as /dev/ttyUSB0 or /dev/ttyACM0), you must update line 51 of Hand_Gesture.py before running. A 2-second time.sleep() after connection allows the Arduino to complete its reset cycle β removing this delay causes dropped initial commands.
Warning
The model file hand_landmarker.task is downloaded from Google's public storage on first run. If you operate in an air-gapped or firewall-restricted environment, pre-download the model manually and place it adjacent to Hand_Gesture.py. The download uses urllib.request without TLS certificate pinning or checksum verification β do not run this on untrusted networks without additional validation.
- Single-hand detection only (
num_hands=1) β the system ignores a second hand in frame - No gesture debouncing β rapid fluctuations between poses cause command flooding over serial
- The
IMAGErunning mode processes each frame independently with no temporal smoothing - Serial port and baud rate are hardcoded β no CLI argument or config file support
- No Arduino-side sketch is included in this repository
- Add CLI arguments for serial port, baud rate, and camera index
- Implement gesture debouncing with a configurable cooldown timer
- Include the companion Arduino motor-control sketch in the repo
- Add
pytesttests with mocked serial and pre-recorded video frames - Support multi-hand detection for two-handed gesture combos
- Switch to
VIDEOorLIVE_STREAMrunning mode for temporal smoothing - Package as a
pip-installable module withpyproject.toml - Add a configuration file (
config.yaml) for all tunable parameters
Contributions are welcome. Fork the repository, create a feature branch, and open a pull request against main. Keep changes focused β one feature or fix per PR. Follow PEP 8 for Python style and include clear commit messages describing the why, not just the what.
Important
Run the full application manually and verify gesture detection before opening a PR. Until automated tests are added, manual validation against all five gestures is the acceptance criterion.
Distributed under the MIT License. See LICENSE for full terms.
Built with β₯ by Relvixx Β· Hand Gesture Robot Controller Β· 2026