Turn your Astro blog posts into narrated audio with word-level synchronization.
- 🎙️ Text-to-Speech Synthesis - Generate natural-sounding audio narration for your content
- 🎯 Word-Level Alignment - Precise timestamps for every word, powered by forced alignment
- ✨ Live Word Highlighting - Karaoke-style highlighting that follows along with playback
- 🎛️ Built-in Audio Player - Accessible player with keyboard shortcuts and mini-player mode
- 🌍 14 Languages - Global reach with support for 14 languages
- 🎨 Fully Themeable - CSS variables for seamless integration with any design
🔗 Live Demo — See the audio player and word highlighting in action
📂 Demo Source Code — Example implementation for reference
- Installation
- Quick Start
- Project Structure
- Configuration
- CLI Commands
- Components
- Supported Languages
- Math Support
- Deployment
- Important: Audio Map
- Customizing Styles
npm install @vocasync/astro
# or
bun add @vocasync/astro
# or
pnpm add @vocasync/astroCreate a vocasync.config.mjs file in your project root:
// vocasync.config.mjs
export default {
collection: {
name: "blog", // Your content collection name
path: "./src/content/blog", // Path to your content
},
};Update your astro.config.mjs:
// astro.config.mjs
import { defineConfig } from "astro/config";
import vocasync from "@vocasync/astro";
import { rehypeAudioWords } from "@vocasync/astro/rehype";
export default defineConfig({
markdown: {
rehypePlugins: [
[rehypeAudioWords, {
collectionName: "blog", // Must match your collection name
audioMapPath: "src/data/audio-map.json" // Must match output.audioMapPath
}]
]
},
integrations: [vocasync()],
});Create a .env file:
VOCASYNC_API_KEY=voca_xxxxxxxxxxxxxxxxGet your API key at vocasync.io
mkdir -p src/datanpx vocasync syncThis will, for each post:
- Read the content from your collection
- Submit a synthesis job, then an explicit alignment job (with the post's transcript)
- Wait for both to complete and fetch the word timings
- Save everything (URLs, keys, timings) to
audio-map.json
In your article layout or page:
---
// src/layouts/ArticleLayout.astro
import AudioPlayer from "@vocasync/astro/components/AudioPlayer.astro";
import audioMap from "../data/audio-map.json";
const { post } = Astro.props;
const audioEntry = audioMap.entries[post.slug];
---
<article>
<!-- Audio player at the top -->
<AudioPlayer slug={post.slug} audioEntry={audioEntry} label="Listen to this post" />
<!-- Article content - must have data-article-body for word highlighting -->
<div data-article-body>
<slot />
</div>
</article>After setup, your project should look like this:
my-astro-site/
├── astro.config.mjs # Astro config with vocasync integration
├── vocasync.config.mjs # VocaSync configuration
├── .env # API key (add to .gitignore)
├── src/
│ ├── content/
│ │ └── blog/ # Your content collection
│ │ ├── my-post.md
│ │ └── another-post.md
│ ├── data/
│ │ └── audio-map.json # Generated - DO NOT DELETE (see below)
│ └── layouts/
│ └── ArticleLayout.astro
└── package.json
Full configuration options:
// vocasync.config.mjs
export default {
// Content collection settings (required)
collection: {
name: "blog", // Collection name
path: "./src/content/blog", // Path to content files
slugField: "slug", // Frontmatter field for slug (optional)
},
// Language for synthesis and alignment (ISO 639-1 code)
// See "Supported Languages" section below for all options
language: "en",
// Synthesis settings
synthesis: {
// alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer
voice: "onyx",
quality: "sd", // sd (standard) or hd (high definition)
format: "mp3", // mp3, aac, opus, flac, wav (sent as outputFormat)
},
// LaTeX/math support
math: {
enabled: false, // Enable math-to-speech conversion
style: "clearspeak", // clearspeak or mathspeak
},
// Output settings
output: {
audioMapPath: "./src/data/audio-map.json",
},
// Frontmatter field to opt-in/out per post
frontmatterField: "audio", // Set `audio: false` in frontmatter to skip
// Processing options
processing: {
concurrency: 3, // Parallel jobs (1-10)
force: false, // Force reprocessing
},
};Any post can override the global voice, language, or format from its frontmatter.
Values that aren't overridden fall back to vocasync.config.mjs:
---
title: "Bienvenue"
language: fr
voice: shimmer
---
Bonjour…Invalid values are skipped with a warning. Changing any of these re-syncs the post (the change-detection hash includes the resolved voice/language/format).
// In astro.config.mjs
[rehypeAudioWords, {
collectionName: "blog", // Content collection name
audioMapPath: "src/data/audio-map.json", // Path to audio map
classPrefix: "vocasync", // CSS class prefix (default: "vocasync")
}]# Sync all content (synthesis + alignment)
npx vocasync sync
# Sync a single post
npx vocasync sync --only my-post-slug
# Force reprocessing (ignores cache)
npx vocasync sync --force
# Dry run (preview without API calls)
npx vocasync sync --dry-run
# Use a custom config file
npx vocasync sync --config ./path/to/vocasync.config.mjs
# Check configuration
npx vocasync check
# Check job status
npx vocasync status <projectUuid>
# Show help
npx vocasync help| Option | Description |
|---|---|
--only <slug> |
Only process a specific post by slug |
--force |
Force reprocessing, ignore cache |
--dry-run |
Preview what would be processed without API calls |
--config <path> |
Use a custom config file path |
The main audio player component with word highlighting support.
---
import AudioPlayer from "@vocasync/astro/components/AudioPlayer.astro";
---
<AudioPlayer
slug={post.slug}
label="Listen to this post"
articleSelector="[data-article-body]"
enableHighlighting={true}
enableClickToSeek={true}
enableMiniPlayer={true}
trailLength={4}
/>| Prop | Type | Default | Description |
|---|---|---|---|
slug |
string |
required | Post slug to lookup audio |
audioEntry |
object |
undefined |
Audio entry from audio-map.json (audioUrl, synthesisPublishableKey, words, duration) |
label |
string |
"Listen to this article" |
Accessible label |
showPlaceholder |
boolean |
true |
Show message when no audio |
class |
string |
"" |
Additional CSS classes |
articleSelector |
string |
"[data-article-body]" |
Selector for word highlighting container |
enableMiniPlayer |
boolean |
true |
Show floating mini player on scroll |
enableHighlighting |
boolean |
true |
Enable word highlighting |
enableClickToSeek |
boolean |
true |
Enable click on words to seek audio |
trailLength |
number |
4 |
Number of trailing highlighted words |
When the player is focused (click on it or Tab to it), the following keyboard shortcuts are available:
| Key | Action |
|---|---|
Space |
Play/Pause |
← Left Arrow |
Seek backward 5 seconds |
→ Right Arrow |
Seek forward 5 seconds |
M |
Toggle mute |
H |
Toggle word highlighting |
The player includes a highlighter icon button that allows users to toggle word highlighting on/off during playback. This is useful for users who find the highlighting distracting or prefer to just listen.
When enableClickToSeek is enabled (default), clicking on any word in the article will:
- Seek the audio to that word's timestamp
- Start playback if paused
This is useful for jumping to specific parts of an article. Disable it with enableClickToSeek={false} if you prefer words to not be interactive.
For word highlighting to work, wrap your article content with data-article-body:
<div data-article-body>
<Content /> <!-- Your markdown content -->
</div>The rehype plugin wraps each word in a <span> with timing data at build time.
VocaSync supports 14 languages where both speech synthesis and forced alignment are available. Languages use ISO 639-1 codes:
| Code | Language | Code | Language |
|---|---|---|---|
zh |
Chinese | pl |
Polish |
cs |
Czech | pt |
Portuguese |
en |
English | ru |
Russian |
fr |
French | es |
Spanish |
de |
German | sv |
Swedish |
ja |
Japanese | tr |
Turkish |
ko |
Korean | uk |
Ukrainian |
Note: VocaSync requires both speech synthesis and word-level forced alignment for each language. While synthesis (powered by OpenAI TTS) supports 61 languages, alignment (powered by Montreal Forced Aligner) is available for a smaller set. The 14 languages listed above are where both capabilities overlap, and they match the platform's alignment-supported set.
VocaSync supports LaTeX math equations using Speech Rule Engine to convert math to spoken text.
Install the math dependencies. mathjax-full powers both the spoken form and (via
rehype-mathjax) the visual rendering, so you don't also need KaTeX:
bun add remark-math rehype-mathjax mathjax-full speech-rule-engineMath needs three plugins in a specific order:
// astro.config.mjs
import { defineConfig } from "astro/config";
import vocasync from "@vocasync/astro";
import { rehypeAudioWords, rehypeMathSpeech } from "@vocasync/astro/rehype";
import remarkMath from "remark-math";
import rehypeMathjax from "rehype-mathjax";
const collectionName = "blog";
const audioMapPath = "src/data/audio-map.json";
export default defineConfig({
markdown: {
remarkPlugins: [remarkMath], // parse $...$ / $$...$$
rehypePlugins: [
[rehypeMathSpeech, { collectionName, audioMapPath }], // attach spoken form (before render)
rehypeMathjax, // render math to HTML
[rehypeAudioWords, { collectionName, audioMapPath }], // wrap words + math units
],
},
integrations: [vocasync()],
});Enable math in vocasync.config.mjs so vocasync sync generates the spoken forms:
export default {
// ...
math: { enabled: true, style: "clearspeak" }, // or "mathspeak"
};Currency
$collides with math. Withremark-mathenabled,$5 … $1200is parsed as an inline math span. Escape currency dollar signs as\$(e.g.\$5,\$1200) so they're treated as text — VocaSync then speaks them correctly ("five dollars").
- During
vocasync sync(Node/CLI), each LaTeX expression is converted to spoken text (e.g.$x^2$→ "x squared") and stored inaudio-map.json. It's spoken and aligned as part of the post's audio. - At build time, rehypeMathSpeech reads those spoken forms from the audio map and
attaches each as a
data-speechattribute on the math element (math-to-speech never runs inside the Astro/Vite build). - rehypeMathjax renders the math to visual HTML.
- rehypeAudioWords wraps each math expression as a single highlight unit, so the whole equation lights up together while it's read.
Set the speech style in vocasync.config.mjs:
export default {
// ...
math: {
enabled: true,
style: "clearspeak", // or "mathspeak"
},
};- clearspeak: Natural, conversational style (recommended)
- mathspeak: More formal, precise mathematical speech
If you have many posts, we recommend running npx vocasync sync once locally before your first deployment:
# Run locally to generate all audio (may take a while)
npx vocasync sync
# Commit the audio-map to version control
git add src/data/audio-map.json
git commit -m "Add audio map"
git pushThis approach:
- Prevents long CI/CD build times (important for platforms like Vercel with time limits)
- Only new or changed posts will be processed on subsequent builds
- Audio map acts as a cache - existing entries are skipped
For ongoing updates, include the sync command in your build script:
{
"scripts": {
"dev": "astro dev",
"build": "npx vocasync sync && astro build",
"preview": "astro preview"
}
}Since most builds only process new or changed content, this adds minimal time.
Make sure to set VOCASYNC_API_KEY in your deployment environment:
- Vercel: Settings → Environment Variables
- Netlify: Site settings → Environment variables
- GitHub Actions: Repository secrets
name: Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v1
- run: bun install
- run: bun run build
env:
VOCASYNC_API_KEY: ${{ secrets.VOCASYNC_API_KEY }}
- name: Deploy
# Your deploy step hereThe audio-map.json file is the source of truth for VocaSync. Each entry stores:
- The synthesis and alignment project UUIDs (the two-POST flow uses two projects)
- A publishable key for each (the synthesis key streams the audio; the alignment key was used at build time to fetch the timings)
- The word timings (
words) and any math spoken forms (mathSpeech), embedded so the player needs no runtime alignment fetch - The resolved
voice/language/format, a content hash, and timestamps
- Version 1/2 (legacy): a single project + one
publishableKey, no embedded timings. - Version 3 (current): two projects + two publishable keys, embedded
wordstimings, and per-postvoice/language/format.
Legacy v1/v2 entries are missing the v3 fields, so the first vocasync sync re-synthesizes
them once to produce complete v3 entries.
If you delete audio-map.json, running npx vocasync sync will re-create synthesis and alignment jobs for ALL content. This will:
- Incur API costs for re-processing everything
- Generate new audio files (old URLs will still work)
- Commit to version control: Add
audio-map.jsonto git - Back it up: Keep a backup before major changes
- Don't edit manually: Let the CLI manage this file
# Add to git
git add src/data/audio-map.json
git commit -m "Add audio map"Add your .env file to .gitignore:
# .gitignore
.env
.env.localOverride CSS variables to match your theme:
:root {
/* Player colors */
--vocasync-primary: #3b82f6;
--vocasync-primary-content: white;
--vocasync-surface: #f8fafc;
--vocasync-border: #e2e8f0;
--vocasync-text: #1e293b;
--vocasync-text-muted: #64748b;
/* Word highlighting */
--vocasync-highlight: #10b981;
--vocasync-highlight-text: white;
--vocasync-highlight-active-opacity: 0.25;
--vocasync-highlight-trail-opacity: 0.12;
}
/* Dark mode */
@media (prefers-color-scheme: dark) {
:root {
--vocasync-surface: #1e293b;
--vocasync-border: #334155;
--vocasync-text: #f1f5f9;
--vocasync-text-muted: #94a3b8;
}
}Create a vocasync.config.mjs file in your project root.
- Make sure the rehype plugin is configured in
astro.config.mjs - Check that
collectionNamematches your collection - Verify
audioMapPathpoints to your audio map - Ensure content is wrapped in
[data-article-body]
- Run
npx vocasync syncto generate audio - Check that
audio-map.jsonexists and has entries - Verify the
slugprop matches your content slug
# Check your configuration
npx vocasync check
# Verify API key is set
echo $VOCASYNC_API_KEYMIT