A voice-activated AI assistant for Windows. Say "Jarvis", speak your command, and it figures out what to do — open apps, search the web, analyze screenshots, scaffold code, etc.
- Wake word — Vosk listens offline for "jarvis" on a continuous audio stream
- Command recording — faster-whisper transcribes your speech after the wake word
- AI dispatch — Claude Haiku parses the transcript and returns a JSON action object
- Execution — the action is routed to the appropriate handler
- TTS — response is spoken via Windows SAPI; saying the wake word mid-speech interrupts it
bash setup.shThis installs dependencies, downloads the Vosk model, creates folders, and writes a .env template.
Then fill in .env:
ANTHROPIC_API_KEY=sk-ant-...
BRAVE_API_KEY=BSA... # optional — enables real-time web search
Run:
python jarvis.py| Command type | Example |
|---|---|
| Open a workspace | "Open my coding workspace" |
| Open a URL | "Open github" |
| Launch an app | "Open Spotify" |
| Search the web | "Search YouTube for lo-fi beats" |
| Find and open files | "Find files named resume" |
| Open a bookmark | "Open my Notion bookmark" |
| Take and analyze a screenshot | "Take a screenshot and tell me what this place is" |
| Analyze an image | "What's in my input folder?" |
| Web search (live) | "What's the weather in New York?" |
| Generate a file | "Create a file about the history of programming" |
| Scaffold a project | "Code hello world" |
| Chat | "What is the difference between a thread and process?" |
Voice pipeline
- Offline wake word detection with Vosk (no cloud round-trip)
- WebRTC VAD for noise-resistant silence detection
- Interrupt TTS mid-sentence by saying the wake word again
Dashboard
- React UI at
http://localhost:5151showing live state, transcript, and generated files
Phone interface
- Tap-to-speak web page served over Tailscale HTTPS — send commands from your phone without being near your PC
Workspaces
- Named groups of URLs, apps, VS Code projects, and files defined in
workspaces.json - Edit them via the React workspace manager at
http://localhost:5151/workspaces— saves directly toworkspaces.jsonand hot-reloads voice triggers without restarting
File & browser integration
- Indexes local files at startup for fast fuzzy search
- Reads bookmarks from Chrome/Edge profiles
Image I/O
- Drop images into
jarvis_input/and ask Jarvis to analyze them - Generated images and files go to
jarvis_output/
anthropic python-dotenv pyautogui pygetwindow Pillow
sounddevice numpy faster-whisper soundfile vosk
requests flask flask-cors webrtcvad
Optional: realesrgan-ncnn-vulkan.exe for image upscaling (update ESRGAN_EXE in jarvis.py).