Skip to content

corysimmons/parlai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parlai

Parlai — AI Text to Speech

Free, high-quality AI text-to-speech Chrome extension.
Select text, right-click, and parley. Everything runs locally in your browser.


Description

Parlai turns any webpage into a spoken experience. Highlight text, right-click, and hear it read aloud by AI voices — no servers, no accounts, no data leaves your browser.

Features:

  • Dual engine — GPU mode (Supertonic 2, WebGPU) for fast, high-quality speech. CPU mode (Piper VITS, WASM) as a fallback that works on any machine.
  • 14 AI voices — 10 GPU voices and 7 CPU voices with distinct styles and accents.
  • 4 languages — English, Spanish, Portuguese, French. Auto-translates highlighted text to the selected language before speaking.
  • Pitch-preserving speed control — Slow (0.8x), Normal (1x), Fast (1.4x) via RubberBand WASM. No chipmunk effect.
  • Transport controls — Play/pause, skip forward/back 10s and 30s.
  • 100% local — All models bundled in the extension (~1.1GB). No API keys, no cloud, no tracking.
  • Smart text cleanup — Strips Wikipedia citations, expands numbers to words (1950 → "nineteen fifty"), normalizes symbols.

Supported languages:

Language Translation GPU voices CPU voices
English 10 4
Spanish Auto-translate 10 4 + Carlos (native accent)
Portuguese Auto-translate 10 4 + Lucas (native accent)
French Auto-translate 10 4 + Camille (native accent)

Translation uses Chrome's built-in Translator API (local, if enabled) with a fallback to Google Translate.

Development

Prerequisites

  • Git LFS — ONNX model files (~1GB) are stored with Git Large File Storage
  • Node.js 18+
  • Chrome 120+

Setup

git lfs install          # one-time setup
git clone https://github.com/corysimmons/parlai.git
cd parlai/extension
npm install

Note: If you cloned without LFS, run git lfs pull to download the model files.

Build

node build.mjs

This produces two bundles:

  • offscreen.js — Main thread (Supertonic engine, audio playback, RubberBand)
  • piper-worker.js — Web Worker (Piper phonemize + ONNX inference, off the main thread)

Load in Chrome

  1. Go to chrome://extensions
  2. Enable "Developer mode"
  3. Click "Load unpacked" and select the extension/ directory
  4. Select text on any page, right-click → "Parlai — AI Text to Speech"

Project structure

extension/
├── background.js          # Service worker: context menu, message relay, translation
├── offscreen.src.js       # Offscreen doc source: playback, Supertonic, RubberBand
├── offscreen.js           # Built bundle (do not edit)
├── piper-worker.src.js    # Piper Web Worker source: phonemize + ONNX inference
├── piper-worker.js        # Built bundle (do not edit)
├── piper-engine.js        # Piper engine module (used by worker)
├── supertonic.js          # Supertonic 2 TTS engine
├── popup.html/css/js      # Extension popup UI
├── content.js/css          # Page toasts
├── build.mjs              # esbuild config
├── calibrate.wav          # Audio latency calibration tone
├── rubberband.wasm        # RubberBand time-stretching
├── models/                # Supertonic ONNX models (~253MB, gitignored)
├── piper/                 # Piper ONNX voice models + espeak-ng WASM (~774MB, gitignored)
├── voices/                # Supertonic voice style JSONs
├── ort/                   # ONNX Runtime Web (WebGPU + WASM)
└── icons/                 # Extension icons

Architecture

Text is chunked and synthesized one chunk at a time. Each chunk is independently time-stretched via RubberBand WASM, then played through an Audio element. The first chunk plays while the rest generate in the background.

  • GPU path: Supertonic 2 runs 4 ONNX models (duration predictor, text encoder, vector estimator, vocoder) via WebGPU in the offscreen document.
  • CPU path: Piper runs espeak-ng phonemization + VITS ONNX inference in a dedicated Web Worker so the main thread stays responsive.

License

MIT

About

Free, high-quality AI text-to-speech Chrome extension.

Resources

Stars

Watchers

Forks

Contributors

Languages