Skip to content

HanYangZhao/NocturneAI

Repository files navigation

Nocturne AI

A real-time audio chat interface powered by OpenAI's Realtime API and ElevenLabs text-to-speech. Nocturne AI provides low-latency conversational AI with audio processing, mixing, visual effects, and MIDI control capabilities.

Features

  • Real-time Voice Chat: Low-latency conversational AI using OpenAI's GPT-4o Realtime model
  • Professional Audio Processing: Audio effects (reverb, delay, compression, EQ, distortion) powered by Tuna.js
  • Audio Mixing: Multi-channel audio mixer with voice control and panoramic effects
  • Text-to-Speech: High-quality synthesis via ElevenLabs with multiple voice options
  • Speech-to-Text: Real-time transcription using ElevenLabs STT
  • 3D Visualization: Interactive particle system and audio visualizers built with Three.js and React Three Fiber
  • MIDI Support: Full MIDI controller integration for parameter control
  • Transcript Display: Real-time conversation transcript with visual effects
  • Export/Import: Save and load audio config, midi parameters

Tech Stack

Frontend:

Backend/APIs:

Python Backend (Optional):

  • OpenAI Whisper - Speech recognition
  • PyTorch - ML framework
  • PySimpleGUI - Desktop UI

Getting Started

Prerequisites

  • Node.js 20+
  • npm or yarn package manager
  • OpenAI API key (for Realtime API access)
  • ElevenLabs API key (for TTS/STT services)

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/NocturneAI.git
cd NocturneAI
  1. Install dependencies:
npm install
  1. Set up environment variables:
cp env.example .env.local
  1. Add your API keys to .env.local:
OPENAI_API_KEY=your_openai_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key

Running the Development Server

npm run dev

Open http://localhost:3000 in your browser to access the application.

The app will hot-reload as you make changes to the code.

Building for Production

npm run build
npm start

Configuration

  • Voices: Edit src/app/voices.json to customize available voice options
  • Effects: Configure audio effects in the UI or via MIDI controller
  • Visual Settings: Adjust particle brightness and text display speed in the transcript panel

API Routes

  • POST /api/tts - Text-to-speech synthesis
  • POST /api/stt/elevenlabs-token - Get ElevenLabs STT token
  • POST /api/ephemeral - Get OpenAI Realtime ephemeral token

Project Structure

src/
├── app/
│   ├── AudioChatClean.tsx       # Main chat component
│   ├── audiofx.ts               # Audio effects chain
│   ├── audioMixer.ts            # Multi-channel mixer
│   ├── midi.tsx                 # MIDI controller support
│   ├── voices.json              # Voice configurations
│   ├── api/                     # Backend API routes
│   └── scribe/                  # Real-time transcription
├── docs/                        # Documentation
└── python/                      # Optional Python backend

Documentation

Development

Linting

npm run lint

Type Checking

npx tsc --noEmit

License

See LICENSE file for details.

Support

For issues and feature requests, please open an issue on the GitHub repository.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •