Skip to content

ssarunic/dalston

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,258 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dalston

License Python 3.11+

Ollama for ASR. Run open-source speech recognition models on your machine or private cloud. Freedom from proprietary APIs, full privacy, no quality compromise.

Dalston transcription console — pipeline view with prepare → transcribe → diarize stages and speaker-labeled transcript

Why Dalston

Pluggable and extensible — Mix and match transcription, alignment, diarization, and PII detection models. Swap components without breaking your pipeline. Completely open source and free.

Drop-in integration — OpenAI and ElevenLabs compatible APIs mean you can point your existing code at Dalston and it just works. Need more power? The native Dalston API unlocks advanced functionality like multi-engine routing, pipeline customization, and detailed engine metadata.

Cheap to runmake dev is free. A 1-hour podcast on a spot GPU costs cents. A 24/7 ElevenLabs/OpenAI-compatible API on AWS runs around $87/month all-in. See the cost estimator.

What It Does

Transcribe audio files or live streams with speaker diarization, word-level timestamps, and GPU acceleration. Run it on your own infrastructure.

# One-command local transcription (M57 zero-config bootstrap)
# - auto-starts local server if missing
# - auto-ensures default model (distil-small)
DALSTON_SECURITY_MODE=none dalston transcribe tests/audio/test_merged.wav --format json
{
  "text": "Hello, welcome to the meeting...",
  "segments": [
    {"speaker": "SPEAKER_01", "start": 0.0, "end": 2.5, "text": "Hello, welcome to the meeting."},
    {"speaker": "SPEAKER_02", "start": 2.8, "end": 5.1, "text": "Thanks for having me."}
  ]
}

Quick Start

git clone https://github.com/ssarunic/dalston.git && cd dalston
make dev      # full local stack on Docker

For zero-Docker single-process mode or AWS deployment, see the guides.

Features

  • Batch & Real-time — File uploads or WebSocket streaming
  • Speaker Diarization — Identify who said what
  • Word Timestamps — Precise timing for every word
  • OpenAI & ElevenLabs Compatible — Drop-in replacement for existing integrations
  • Modular Engines — Faster Whisper, NeMo Parakeet, Voxtral, Pyannote, and more
  • Private by Default — Runs entirely on your infrastructure, no data leaves your environment

Documentation

Start here:

Engineering reference:

License

Apache 2.0

About

Ollama for speech-to-text. Mix open-source engines (Whisper, Parakeet, Pyannote, Voxtral, and others) behind one OpenAI and ElevenLabs-compatible API.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors