handsfree-ai

Built to survive IKEA.

Phone on a tripod. AI watches. You build. Fully open-source DIY assistant — no cloud, no subscription, no hands required.

The Problem

You're 45 minutes into an IKEA KALLAX build. You need to flip a cam lock. Your hands are covered in sawdust. You have no idea which direction "clockwise" is anymore.

You could:

Take off your gloves, unlock your phone, google it, get sawdust on the screen
Or just say "hey assistant, which way does this go?" and keep building

That's it. That's the whole product.

How It Works

┌─────────────────────────────────────────────────────────────┐
│  Phone (browser)  →  /ws/intake  →  LLaVA (Ollama)         │
│                                          ↓                   │
│  Microphone  →  faster-whisper  →  Intent Router            │
│                                          ↓                   │
│                              Session Manager (steps)         │
│                                          ↓                   │
│                              /ws/analysis  →  Browser        │
└─────────────────────────────────────────────────────────────┘

Your phone camera streams frames over WebSocket to the server
Your voice is captured via faster-whisper with silero-vad gating
Say "hey assistant" — the hotword wakes it
LLaVA (running locally via Ollama) analyzes the frame + your current step
Guidance pushes back to your phone instantly

Zero cloud. Zero subscriptions. Everything runs on your home rig.

Features

Completely Private

All inference runs locally via Ollama. No image ever leaves your network. Not even a ping to an external API.

Built for Noisy Environments

Uses silero-vad instead of webrtcvad — significantly more robust in reverberant workshops with background noise (fans, music, power tools). Configurable sensitivity from 0 (strict) to 3 (permissive).

# Crank sensitivity for a loud workshop
python -m uvicorn main:app --env VAD_SENSITIVITY=3

Actually Hands-Free

Mic hot indicator pulses green on the phone browser when voice is actively detected
Safety banner auto-triggers and pauses the session if LLaVA flags a hazard
Step navigation entirely by voice: "next step," "go back," "pause"

Expandable via Packs

The intelligence lives in packs/. Each pack is one Python file that defines the domain knowledge for a specific task type. Swap packs without restarting the server.

Quickstart

Prerequisites

Python 3.11+
Ollama installed and running
A GPU (CPU works, just slow — LLaVA is chunky)

# 1. Pull the vision model
ollama pull llava:13b

# 2. Clone and install
git clone https://github.com/ninja-otaku/handsfree-ai
cd handsfree-ai
pip install -r requirements.txt

# 3. Configure
cp .env.example .env
# Edit .env: set OLLAMA_HOST, WHISPER_MODEL, etc.

# 4. Run
uvicorn main:app --host 0.0.0.0 --port 8000

# 5. Open on your phone
# http://YOUR_RIG_IP:8000
# Mount phone on tripod. Point at your work. Say "hey assistant".

Configuration

All settings live in .env:

Variable	Default	Description
`OLLAMA_HOST`	`http://localhost:11434`	Ollama instance URL
`VISION_MODEL`	`llava:13b`	Any Ollama multimodal model
`WHISPER_MODEL`	`tiny`	`tiny` / `base` / `small` / `medium`
`HOTWORD`	`hey assistant`	Wake phrase (case-insensitive)
`VAD_SENSITIVITY`	`2`	0 = strict, 3 = permissive
`ACTIVE_PACK`	`ikea`	Pack name (filename without `.py`)
`PORT`	`8000`	Server port
`TLS_ENABLED`	`false`	Enable HTTPS (required for camera on non-localhost)

Camera note: Browsers require HTTPS to access getUserMedia from non-localhost origins. Set TLS_ENABLED=true and provide cert paths, or proxy behind nginx with a self-signed cert for LAN use.

The Packs Architecture

This is the growth engine of the project.

packs/
├── schema.json          # JSON Schema — all packs validated against this
├── base_pack.py         # Abstract base class
├── PACK_TEMPLATE.py     # 58-line annotated template ← start here
├── ikea.py              # IKEA furniture assembly
└── your_domain.py       # ← your contribution

A pack is a Python class that does two things:

class Pack(BasePack):
    metadata = {
        "name": "IKEA Assembly",
        "version": "1.0.0",
        "domain": "furniture",
        "description": "...",
        "safety_keywords": ["tip over", "two person", "sharp edge"]
    }

    def system_prompt(self) -> str:
        return """You are an IKEA assembly assistant.
        ...your domain knowledge here...
        """

That's it. The framework handles validation, safety interrupts, session state, WebSocket broadcasting, and voice routing automatically.

Activate a pack via API

curl -X POST http://localhost:8000/packs/activate \
  -H "Content-Type: application/json" \
  -d '{"pack": "ikea"}'

Domains that need packs (PRs welcome)

3d_printing — layer adhesion, support removal, bed leveling
car_maintenance — oil changes, brake pads, filter swaps
electronics_repair — soldering guidance, component ID, ESD warnings
cooking — mise en place timing, temperature checks, technique cues
woodworking — joint alignment, grain direction, finishing sequences
plumbing — valve directions, fitting types, leak checks

Contributing a Pack

# 1. Copy the template
cp packs/PACK_TEMPLATE.py packs/your_domain.py

# 2. Fill in metadata and system_prompt()
# The template has 6 clearly marked TODOs

# 3. Validate
python -c "from packs.your_domain import Pack; p=Pack(); p.validate(); print('OK')"

# 4. Test live
curl -X POST http://localhost:8000/packs/activate \
  -d '{"pack": "your_domain"}'

# 5. Open a PR

The only requirement: your system_prompt() must tell LLaVA how to reason about your domain's safety keywords. Everything else is flexible.

Project Structure

handsfree-ai/
├── main.py                    # FastAPI app, WebSocket hub, voice loop
├── config.py                  # Pydantic settings
├── intake/
│   └── voice_intake.py        # faster-whisper + silero-vad daemon
├── engine/
│   └── session_manager.py     # Step state machine (IDLE/ACTIVE/PAUSED/COMPLETED)
├── providers/
│   └── ollama_vision.py       # LLaVA inference + JSON extraction
├── packs/
│   ├── schema.json
│   ├── base_pack.py
│   ├── PACK_TEMPLATE.py
│   └── ikea.py
└── static/
    └── index.html             # Phone-optimized PWA UI

Tech Stack

Layer	Choice	Why
Vision	LLaVA 13B via Ollama	Free, local, surprisingly capable
Speech-to-text	faster-whisper	4× faster than openai-whisper, same accuracy
VAD	silero-vad	Survives workshops; webrtcvad doesn't
Backend	FastAPI + WebSockets	Async pub/sub for multi-client broadcast
Frontend	Vanilla JS	No build step — phone browser just works

Roadmap

v1.1 — Bluetooth footpedal as hardware "next step" trigger
v1.1 — Pack marketplace index (community-submitted schema.json registry)
v1.2 — Step photo capture — auto-photograph each completed step for your records
v1.2 — LLaVA model hot-swap without server restart
v2.0 — Offline STT fallback (Vosk) for fully air-gapped use

License

MIT. Build whatever you want. A star is appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

handsfree-ai

The Problem

How It Works

Features

Completely Private

Built for Noisy Environments

Actually Hands-Free

Expandable via Packs

Quickstart

Prerequisites

Configuration

The Packs Architecture

Activate a pack via API

Domains that need packs (PRs welcome)

Contributing a Pack

Project Structure

Tech Stack

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
engine		engine
intake		intake
packs		packs
providers		providers
static		static
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

handsfree-ai

The Problem

How It Works

Features

Completely Private

Built for Noisy Environments

Actually Hands-Free

Expandable via Packs

Quickstart

Prerequisites

Configuration

The Packs Architecture

Activate a pack via API

Domains that need packs (PRs welcome)

Contributing a Pack

Project Structure

Tech Stack

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages