Project Duration: 2026.02.09 ~ 2026.02.25 (16 Days)
Wall-E is an automated inspection system that utilizes drone imagery to detect cracks and defects on building exteriors. Our solution leverages computer vision technology and a hyper-converged multi-threaded streaming architecture to identify, classify, and visualize structural anomalies, providing a safer and more efficient maintenance workflow.
Video captured by the drone is streamed to the user's smartphone (Flutter app), while a backend video server acts as the intermediary, handling real-time AI analysis and duplicate filtering.
graph TD
Drone["🚁 Drone (DJI Phantom 4 Pro)"]
RTMP["📡 Media Server (MediaMTX)<br/>[RTMP 1935]"]
Backend["💻 Backend Server (FastAPI/Python)<br/>[HTTP 8000]"]
DB[("🐘 Database<br/>(Supabase PostgreSQL)")]
App["📱 Client App (Flutter)<br/>[Android/iOS]"]
Drone -- "1. Raw Live Stream<br/>(720p, 30fps)" --> RTMP
RTMP -- "2. Video Stream Polling" --> Backend
subgraph "AI Engine (3-Track Async)"
Backend -- "3a. Crack Detection (YOLOv11n)" --> AI
Backend -- "3b. Privacy Blur (Window Model)" --> AI
AI -- "3c. Re-ID Deduplication" --> AI
end
Backend -- "4. AI Results & Images<br/>(Supabase Auth context)" --> DB
Backend -- "5. Mobile-Optimized Stream<br/>(MJPEG with Burn-in BBox)" --> App
Backend -- "6. Real-time Metadata<br/>(WebSocket/JSON)" --> App
App -- "7. Manual Capture Requests" --> Backend
App -- "8. Load Gallery & History" --> DB
👉 View Project Kanban Board (GitHub Projects)
| Role | Responsibilities | Key Focus Areas |
|---|---|---|
| Project Architecture (PA) | System Architecture, Tech Direction, Documentation | 3-Track Async Architecture Design, System Integration |
| Backend Developer | Server Architecture, API, DB Design | FastAPI, Supabase, REST API, StreamManager Multi-threading |
| AI Model Developer | Model Training (YOLO), Re-ID Logic | Albumentations Augmentation, MobileNetV3 Embedding, Inference |
| Frontend Developer | Mobile App Development (Flutter) | Floating Action Button (FAB), MJPEG Live Stream View, State Management |
The core engine for real-time inspection, duplicate prevention, and privacy protection.
- Multi-Model Detection Pipeline:
- Crack Detection: YOLOv11n (Nano) - Optimized for ultra-fast structural anomaly recognition.
- Privacy Protection: Built-in Window Detection model. Automatically applies Gaussian Blur (Pixel-level) to building interiors to protect resident privacy during flight.
- Duplicate Filtering (Re-Identification): Powered by the
MobileNetV3-smallmodel.- Prevents storing multiple images of the same crack caused by drone shaking or hovering.
- Compares 576-dimensional embeddings using Cosine Similarity (80% Threshold).
- Advanced Augmentation (Albumentations):
- Simulates vertical flight shaking, Motion Blur, and aggressive sunlight.
- Uses Hard Negative Mining to treat wires and tile joints as Background to minimize False Positives.
A Zero-Latency architecture designed for multi-model concurrent inference.
- Thread 1 (Reception): Dedicated to dumping raw drone frames into a high-speed buffer.
- Thread 2 (Inference): Orchesrates the AI Pipeline (Crack Detect -> Window Blur -> ReID -> DB Save).
- Thread 3 (Broadcast):
- Video Channel: Streams MJPEG (720p 30FPS) with BBox burn-in for instant user feedback.
- Metadata Channel: Dispatches real-time JSON detection alerts via WebSocket to drive the Flutter UI state.
- Storage: Supabase handles all structural data, authentication, and secure image hosting.
The interface for users to monitor zero-delay video and control drone inspections.
- Real-time Monitoring: Utilizing
flutter_mjpegto hit 30FPS perfectly without black screens or memory leaks. - Manual Capture: Users can actively record hazards using a Floating Action Button (FAB) during live streaming, instantly saving to DB (
is_manual=true) equipped with a dedicated visually distinctive badge. - Gallery & Map Integration: Google Maps screens check mission locations, reviewing results (BBox Box On/Off) dynamically.
- Language: Python 3.10+, Dart 3.3+
- AI: YOLOv11 (
ultralytics), PyTorch (MobileNetV3), OpenCV, Albumentations - Backend: FastAPI, SQLAlchemy, PostgreSQL (
psycopg2) - Frontend: Flutter 3.19+
- DB/Auth/Storage: Supabase (Cloud)
- Streaming Server: MediaMTX (RTMP)
- Stream Route:
rtmp://<Server-IP>:1935/live/drone
- Stream Route:
cd backend
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt※ Requires the AI offline weight models (
backend/models/mobile_net_v3_small.pth).
Configure Supabase connection info in the .env file.
DATABASE_URL=postgresql://user:pass@host:6543/postgres
SUPABASE_URL=...
SUPABASE_KEY=...
RTMP_URL=rtmp://localhost:1935/live/drone- The
mediamtxserver is currently running on a team member's Windows/Linux desktop, not on this MacBook. - Port Forwarding has been configured on port
1935to allow the drone and the backend server to access the video feed externally. - There is no need to manually start the media server on this local machine.
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Start Frontend App
cd frontend
flutter run- Unified DB/Auth Infrastructure: Configured the initial Supabase PostgreSQL schema and deployed functional Login/Signup endpoints via FastAPI.
- Real-Time Baseline: Bootstrapped MediaMTX integrations, foundational OpenCV frame captures, and localized Bounding Box overlays.
- Object Tracking Implementation: Integrated YOLO's
model.track()and an active ID cacheSetto prevent multi-saves of static drone footage. - Core Flutter Scaffolding: Structured the baseline App UI routes spanning across New Mission, Gallery, and Profile screens.
- Frontend Type Safety Assured: Solidified API responses by introducing strict
Mission,Detection, andUserDart models. - Zero-Delay & BBox Ratio Fix: Removed pre-inference padding to precisely map bounding box coordinates over raw frames, eliminating y-axis displacement.
- Gallery UI Upgrades: Implemented PageView swipe navigation and responsive Landscape/Portrait bounds on the detail screen.
- Backend Metadata Hierarchy Swap: Optimized overall DB structural integrity by migrating GPS variables strictly from detections upwards to the missions table.
- Google Maps Integration: Shifted static map placeholders to the active Google Maps Static API for active mission locale and GPS coordinate tracking.
- Data Isolation (RLS): Enforced strict Supabase Row Level Security protocols, ensuring users only access their personal flight registries.
- Streaming Protocol Shift (HLS ➔ MJPEG): Physically bypassed HLS buffering latency (3-5s) and WebSocket Desync bugs by having the backend pre-burn coordinate boxes uniformly into MJPEG payloads.
- OpenCV Backend Auto-Detection: Dropped explicit
CAP_FFMPEGallocations to guarantee stable Mac Apple Silicon and Server hardware-accelerated video encodings. - UX & Localization Bug Fixes: Obliterated Korean UTF-8 encoding corruption bugs and formally solidified Android permission configurations.
- 3-Track Async Streaming Architecture: Fully decoupled video reception, AI inference, and client broadcast, erasing Stuttering bottlenecks to realize Zero-Latency.
- Mobile Streaming Optimization: Resolved stuttering and black screens by importing
flutter_mjpegand enforcing 720p 30FPS server-side resizing. - MobileNetV3 Deduplication (Re-ID): Mitigated multi-save tracking limitation scenarios caused by drone wobble using embedding Cosine Similarity (80% Threshold / 10% Margin).
- Manual Capture Integration: Granted users unilateral snapshot power via a live frontend FAB, tagging them in DB as explicit "Manual Captures".
- Augmentation Pipeline Plan: Drafted robust data strategies detailing vertical-shift focus and Hard Negative Mining (e.g. electrical wires, concrete joints) via Albumentations.
- Privacy Mode (Window Model): Integrated a secondary YOLO model to detect and blur windows in real-time, ensuring privacy compliance during urban inspections.
- WebSocket Metadata Channel: Implemented a standalone metadata stream to separate heavy image data from lightweight detection events.
- Final Result Report (HTML/PPT): Developed a high-fidelity interactive HTML presentation and automated PPT generation scripts for stakeholders.
- Style Synchronization: Perfectly matched AI detailed analysis slides between technical prototypes and final reports.
- System Hardening: Optimized MacBook hardware resource allocation for dual-model concurrent execution.