feat: add Moonshine MLX speech models by pyrex41 · Pull Request #58 · matthartman/ghost-pepper

pyrex41 · 2026-04-10T18:42:20Z

Summary

Adds three Moonshine speech-to-text models via the moonshine-mlx pure-Swift Metal-accelerated backend.

Moonshine Tiny (43M params, ~170 MB) — fastest option
Moonshine Small (147M params, ~590 MB) — balanced speed/accuracy
Moonshine Medium (245M params, ~980 MB) — best Moonshine accuracy

Models appear in the Settings picker and download from HuggingFace on first use. Only available on Apple Silicon (#if arch(arm64)).

Moonshine is optimized for on-device inference and offers a different accuracy/speed tradeoff compared to Whisper — worth having as another option for users.

Changes

project.yml — add MoonshineMLX package dependency
SpeechModelCatalog.swift — add .moonshineMLX backend and three model descriptors
ModelManager.swift — add Moonshine model loading and transcription
README.md — document Moonshine models in speech model table

Test plan

Build succeeds
Moonshine models appear in Settings → Speech Model picker
Selecting a Moonshine model downloads from HuggingFace
Transcription works with Moonshine models

🤖 Generated with Claude Code

Fix overlay constraint crash

Models section shows each model with loaded/not loaded status. Taller settings window (580px) to fit all sections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Shows when any model isn't loaded. Downloads WhisperKit and/or cleanup models directly from Settings — no need to re-run onboarding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LLM.swift's bundled llama.cpp doesn't support Qwen3/3.5 architecture. Reverted to Qwen 2.5 1.5B + 3B which work reliably. Will upgrade when LLM.swift updates its llama.cpp. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Info.plist was still at v1.1 so Sparkle thought the "update" was the same version. Now properly at v1.3 build 4. Fixes #2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Improve post-paste learning via Accessibility

Picker in Settings under Input section. Switching models triggers re-download and reload. Default remains small.en for accuracy. tiny.en is ~75 MB and much faster for shorter recordings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…oggle) Matches the original Ghost Pepper behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Input Monitoring prompt doesn't reliably show the system dialog for debug-signed apps. Now attempts to start the hotkey monitor even without it — Accessibility alone is sufficient for Control key. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixed codesign verification failure reported in #4. Now verifies signature after extracting from DMG before release. Also includes: speech model picker, default Control shortcuts, non-blocking Input Monitoring check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>