Skip to content

thunderavi/Jarvis_native

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generated image 1
# JARVIS AI Voice Assistant

JARVIS is a mobile AI assistant built with a Node.js backend and an Expo React Native frontend. It supports realtime voice conversation, AI chat, image understanding, camera-based voice vision, conversation history, JWT authentication, MongoDB storage, and LiveKit-powered voice rooms.

## Features

- Voice interaction with LiveKit Agents
- Text chat with conversation history
- NVIDIA NIM AI responses
- NVIDIA vision/image analysis
- Voice-triggered camera vision  
  Example: “What is in my hand?” or “What is behind me?”
- Auto camera capture for voice vision
- JWT authentication with refresh tokens
- MongoDB conversation, message, token, and voice session storage
- Swagger API documentation
- Sci-fi mobile UI with Jarvis-style voice/chat modes
- User sidebar with profile, settings, and history

## Tech Stack

| Layer | Technology |
|---|---|
| Backend | Node.js, Express.js |
| Database | MongoDB, Mongoose |
| Auth | JWT, bcrypt |
| AI | NVIDIA NIM |
| Realtime Voice | LiveKit Agents |
| Mobile App | React Native, Expo |
| Camera Vision | expo-camera, expo-image-picker |
| API Docs | Swagger / OpenAPI |
| Security | Helmet, CORS, Rate Limit |

## Project Structure

```txt
jarvis/
  controllers/
  routes/
  models/
  services/
  agents/
  docs/
  middleware/
  server.js

natvie/
  App.js
  src/
    components/
    screens/
    services/
    styles/
  assets/

Backend Setup

cd jarvis
npm install

Create .env:

PORT=5000
MONGO_URI=mongodb://127.0.0.1:27017/jarvis
JWT_SECRET=your-secret
JWT_REFRESH_EXPIRES_IN=7d

ASSISTANT_PROVIDER=nvidia
NVIDIA_API_KEY=your-nvidia-api-key
NVIDIA_NIM_BASE_URL=https://integrate.api.nvidia.com
NVIDIA_NIM_MODEL=nvidia/llama-3.1-nemotron-nano-8b-v1
NVIDIA_VISION_MODEL=nvidia/llama-3.1-nemotron-nano-vl-8b-v1

LIVEKIT_URL=your-livekit-url
LIVEKIT_API_KEY=your-livekit-api-key
LIVEKIT_API_SECRET=your-livekit-api-secret
LIVEKIT_AGENT_NAME=jarvis-agent

Run backend and voice agent together:

npm run dev:all

Backend URL:

http://127.0.0.1:5000

Swagger docs:

http://127.0.0.1:5000/api-docs

Mobile App Setup

cd natvie
npm install
npm run start:dev-client

For native modules like LiveKit, camera, and image picker, use a development build:

npm run android:eas

Main API Endpoints

POST   /api/v1/auth/register
POST   /api/v1/auth/login
GET    /api/v1/auth/me

POST   /api/v1/chat/conversations
GET    /api/v1/chat/conversations
POST   /api/v1/chat/conversations/:id/messages
POST   /api/v1/chat/conversations/:id/image-messages
POST   /api/v1/chat/conversations/:id/voice-transcripts

POST   /api/v1/livekit/token
GET    /api/v1/livekit/config

Voice Vision Flow

User says: "What is in my hand?"
        ↓
LiveKit transcript is received
        ↓
Mobile app detects vision intent
        ↓
Camera opens inside the app
        ↓
Photo is captured automatically
        ↓
Image is sent to NVIDIA Vision NIM
        ↓
Answer is saved in MongoDB
        ↓
Jarvis speaks the answer

Scripts

Backend:

npm run dev
npm run dev:all
npm run agent:dev
npm test

Mobile:

npm start
npm run start:dev-client
npm run android:eas

Status

The project currently supports:

  • User login/register
  • Chat mode
  • Voice mode
  • LiveKit voice agent
  • NVIDIA NIM chat
  • NVIDIA vision image analysis
  • Auto camera capture from voice command
  • MongoDB history storage
  • Swagger documentation

License

This project is for learning and personal AI assistant development.

About

Jarvis

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors