# JARVIS AI Voice Assistant
JARVIS is a mobile AI assistant built with a Node.js backend and an Expo React Native frontend. It supports realtime voice conversation, AI chat, image understanding, camera-based voice vision, conversation history, JWT authentication, MongoDB storage, and LiveKit-powered voice rooms.
## Features
- Voice interaction with LiveKit Agents
- Text chat with conversation history
- NVIDIA NIM AI responses
- NVIDIA vision/image analysis
- Voice-triggered camera vision
Example: “What is in my hand?” or “What is behind me?”
- Auto camera capture for voice vision
- JWT authentication with refresh tokens
- MongoDB conversation, message, token, and voice session storage
- Swagger API documentation
- Sci-fi mobile UI with Jarvis-style voice/chat modes
- User sidebar with profile, settings, and history
## Tech Stack
| Layer | Technology |
|---|---|
| Backend | Node.js, Express.js |
| Database | MongoDB, Mongoose |
| Auth | JWT, bcrypt |
| AI | NVIDIA NIM |
| Realtime Voice | LiveKit Agents |
| Mobile App | React Native, Expo |
| Camera Vision | expo-camera, expo-image-picker |
| API Docs | Swagger / OpenAPI |
| Security | Helmet, CORS, Rate Limit |
## Project Structure
```txt
jarvis/
controllers/
routes/
models/
services/
agents/
docs/
middleware/
server.js
natvie/
App.js
src/
components/
screens/
services/
styles/
assets/cd jarvis
npm installCreate .env:
PORT=5000
MONGO_URI=mongodb://127.0.0.1:27017/jarvis
JWT_SECRET=your-secret
JWT_REFRESH_EXPIRES_IN=7d
ASSISTANT_PROVIDER=nvidia
NVIDIA_API_KEY=your-nvidia-api-key
NVIDIA_NIM_BASE_URL=https://integrate.api.nvidia.com
NVIDIA_NIM_MODEL=nvidia/llama-3.1-nemotron-nano-8b-v1
NVIDIA_VISION_MODEL=nvidia/llama-3.1-nemotron-nano-vl-8b-v1
LIVEKIT_URL=your-livekit-url
LIVEKIT_API_KEY=your-livekit-api-key
LIVEKIT_API_SECRET=your-livekit-api-secret
LIVEKIT_AGENT_NAME=jarvis-agentRun backend and voice agent together:
npm run dev:allBackend URL:
http://127.0.0.1:5000Swagger docs:
http://127.0.0.1:5000/api-docscd natvie
npm install
npm run start:dev-clientFor native modules like LiveKit, camera, and image picker, use a development build:
npm run android:easPOST /api/v1/auth/register
POST /api/v1/auth/login
GET /api/v1/auth/me
POST /api/v1/chat/conversations
GET /api/v1/chat/conversations
POST /api/v1/chat/conversations/:id/messages
POST /api/v1/chat/conversations/:id/image-messages
POST /api/v1/chat/conversations/:id/voice-transcripts
POST /api/v1/livekit/token
GET /api/v1/livekit/configUser says: "What is in my hand?"
↓
LiveKit transcript is received
↓
Mobile app detects vision intent
↓
Camera opens inside the app
↓
Photo is captured automatically
↓
Image is sent to NVIDIA Vision NIM
↓
Answer is saved in MongoDB
↓
Jarvis speaks the answerBackend:
npm run dev
npm run dev:all
npm run agent:dev
npm testMobile:
npm start
npm run start:dev-client
npm run android:easThe project currently supports:
- User login/register
- Chat mode
- Voice mode
- LiveKit voice agent
- NVIDIA NIM chat
- NVIDIA vision image analysis
- Auto camera capture from voice command
- MongoDB history storage
- Swagger documentation
This project is for learning and personal AI assistant development.