JARVIS

A voice-activated AI assistant for Windows. Say "Jarvis", speak your command, and it figures out what to do — open apps, search the web, analyze screenshots, scaffold code, etc.

How it works

Wake word — Vosk listens offline for "jarvis" on a continuous audio stream
Command recording — faster-whisper transcribes your speech after the wake word
AI dispatch — Claude Haiku parses the transcript and returns a JSON action object
Execution — the action is routed to the appropriate handler
TTS — response is spoken via Windows SAPI; saying the wake word mid-speech interrupts it

Setup

bash setup.sh

This installs dependencies, downloads the Vosk model, creates folders, and writes a .env template.

Then fill in .env:

ANTHROPIC_API_KEY=sk-ant-...
BRAVE_API_KEY=BSA...        # optional — enables real-time web search

Run:

python jarvis.py

What you can say

Command type	Example
Open a workspace	"Open my coding workspace"
Open a URL	"Open github"
Launch an app	"Open Spotify"
Search the web	"Search YouTube for lo-fi beats"
Find and open files	"Find files named resume"
Open a bookmark	"Open my Notion bookmark"
Take and analyze a screenshot	"Take a screenshot and tell me what this place is"
Analyze an image	"What's in my input folder?"
Web search (live)	"What's the weather in New York?"
Generate a file	"Create a file about the history of programming"
Scaffold a project	"Code hello world"
Chat	"What is the difference between a thread and process?"

Features

Voice pipeline

Offline wake word detection with Vosk (no cloud round-trip)
WebRTC VAD for noise-resistant silence detection
Interrupt TTS mid-sentence by saying the wake word again

Dashboard

React UI at http://localhost:5151 showing live state, transcript, and generated files

Phone interface

Tap-to-speak web page served over Tailscale HTTPS — send commands from your phone without being near your PC

Workspaces

Named groups of URLs, apps, VS Code projects, and files defined in workspaces.json
Edit them via the React workspace manager at http://localhost:5151/workspaces — saves directly to workspaces.json and hot-reloads voice triggers without restarting

File & browser integration

Indexes local files at startup for fast fuzzy search
Reads bookmarks from Chrome/Edge profiles

Image I/O

Drop images into jarvis_input/ and ask Jarvis to analyze them
Generated images and files go to jarvis_output/

Dependencies

anthropic python-dotenv pyautogui pygetwindow Pillow
sounddevice numpy faster-whisper soundfile vosk
requests flask flask-cors webrtcvad

Optional: realesrgan-ncnn-vulkan.exe for image upscaling (update ESRGAN_EXE in jarvis.py).

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.claude		.claude
jarvis-dashboard		jarvis-dashboard
models/vosk-model-small-en-us-0.15		models/vosk-model-small-en-us-0.15
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
jarvis.py		jarvis.py
jarvis_matches.txt		jarvis_matches.txt
jarvis_server.py		jarvis_server.py
package-lock.json		package-lock.json
setup.sh		setup.sh
workspaces.json		workspaces.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JARVIS

How it works

Setup

What you can say

Features

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JARVIS

How it works

Setup

What you can say

Features

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages