Web and Node.js SDK for on-device KittenTTS speech synthesis.
Generate speech in browsers and Node.js without sending text to a cloud TTS API.
Developer preview. APIs may change between releases.
Browser apps use ONNX Runtime Web and browser storage. Node.js apps use ONNX Runtime Web with filesystem storage by default.
Browser ONNX Runtime wasm assets are loaded from the matching ONNX Runtime Web CDN by default. For production apps that need CDN independence or stricter supply-chain controls, self-host those ONNX Runtime assets and set
ortWasmPath.
Web · Plain HTML example with local speech generation and playback
KittenTTS Web lets you add local speech synthesis to browser and Node.js apps:
- Text-to-speech - neural voice synthesis from plain text.
- On-device inference - powered by KittenTTS and ONNX Runtime Web.
- Private by default - no cloud TTS request after assets are available.
- Offline-ready - download once into browser or filesystem cache, or provide preloaded model bytes.
- App-friendly output - play audio directly, save WAV or MP3 data, stream longer text, or use generated word timings for read-aloud UI.
No cloud. No API key. No text leaving the device for speech generation.
The SDK sends anonymous generation analytics; see Getting started for details and opt-out.
| Runtime | Status | Docs |
|---|---|---|
| Browser | Developer preview | Getting started |
| Node.js | Developer preview | Getting started |
| Plain HTML example | Supported | HTML example |
| Vite React example | Supported | Vite React example |
| Node Express example | Supported | Node Express example |
Install:
npm install @kittentts/webInstall the SDK:
npm install @kittentts/webGenerate audio in memory:
import { KittenTTS } from '@kittentts/web';
const tts = await KittenTTS.create(
{
model: 'nano-int8',
},
(progress) => {
console.log(`setup ${Math.round(progress * 100)}%`);
},
);
const result = await tts.generate('Hello from KittenTTS on the web.');
console.log(result.sampleRate);
console.log(result.wavBase64());
console.log(await result.mp3Base64());
await tts.dispose();Play audio in a browser:
import {
KittenTTS,
createBrowserAudioPlayer,
} from '@kittentts/web';
const tts = await KittenTTS.create({
player: createBrowserAudioPlayer(),
});
await tts.speak('This voice is generated in the browser.');Generate audio in Node.js:
import { writeFile } from 'node:fs/promises';
import { KittenTTS } from '@kittentts/web';
const tts = await KittenTTS.create({
model: 'nano-int8',
});
const result = await tts.generate('Generated in Node.js.');
await writeFile('speech.wav', result.wavData());
await writeFile('speech.mp3', await result.mp3Data());
await tts.dispose();Browser apps can use the SDK directly from a frontend bundle:
import {
KittenTTS,
createBrowserAudioPlayer,
} from '@kittentts/web';
const tts = await KittenTTS.create({
player: createBrowserAudioPlayer(),
});The SDK configures ONNX Runtime Web wasm assets automatically. Pass
ortWasmPath when you need to self-host those files.
The plain HTML example can also be opened directly from the filesystem:
npm run example:htmlOpen http://127.0.0.1:5173, or open examples/html/index.html directly in a
browser. Direct file:// usage falls back to in-memory asset storage, so a
refresh may download model assets again.
examples/html- static HTML, CSS, and JavaScript setup.examples/vite-react- Vite React browser setup.examples/vite-react-word-timings- word highlighting with generated timings.examples/node-express- Node Express backend-to-browser playback.
- On-device TTS inference in browsers and Node.js.
- Model download and cache with progress callbacks.
- Offline assets for apps that cannot depend on a first-run download.
- Playback helpers for browser audio and custom audio layers.
- WAV and MP3 output from generated raw PCM samples.
- Word timings for read-aloud highlighting.
- Streaming generation for longer text.
Start with nano-int8 for the smallest download. Use larger models when quality
matters more than size.
| Model | ID | Parameters | Approx download | Use case |
|---|---|---|---|---|
| Nano int8 | 'nano-int8' |
15M | 25 MB | Smallest app/download size |
| Nano fp32 | 'nano' |
15M | 56 MB | Nano quality without quantization |
| Micro | 'micro' |
40M | 41 MB | Better quality, still compact |
| Mini | 'mini' |
80M | 80 MB | Highest quality option |
Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo
await tts.speak('Luna speaking.', { voice: 'luna' });
await tts.speak('Slower Bruno speaking.', { voice: 'bruno', speed: 0.85 });- Docs overview
- Getting started
- Playback
- Offline assets
- Word timings
- Models and voices
- API reference
- Troubleshooting
- Development
Examples:
- Node.js
20+ - Modern browser with WebAssembly support for browser apps
- Network access to Hugging Face for first-run model downloads, unless assets are preloaded or self-hosted
Runtime dependencies installed by the SDK:
onnxruntime-webpako
Audio playback is optional. Use createBrowserAudioPlayer() in browsers or pass
a custom AudioPlayer. Use tts.createPlaybackQueue() when multiple generated
clips should play in order instead of interrupting each other.
- Add more streaming playback examples.
- Add more browser storage and offline asset examples.
- Continue tracking ONNX Runtime Web compatibility across browsers and Node.js.
- Support future KittenTTS model releases as they become available.
Need something specific? Open an issue.
- Website: kittenml.com
- Repository: KittenML/KittenTTS-web
- Discord: Join the community
- Demo: Hugging Face Spaces
- Issues: GitHub Issues
- Commercial support: contact form
Commercial support is available for teams integrating KittenTTS into their products, including integration assistance, custom voice development, and enterprise licensing.
Contact us or email info@stellonlabs.com to discuss your requirements.
Apache 2.0. See LICENSE.
KittenTTS Web is a developer preview and APIs may change between releases. Generated speech quality, pronunciation, timing metadata, and playback behavior can vary by model, browser, device, and runtime. Review generated audio before using it in production workflows.
The SDK runs speech generation locally after assets are available. Anonymous
generation analytics are enabled by default and can be disabled with
analytics: false.

