Skip to content

feat: add Moonshine MLX speech models#58

Closed
pyrex41 wants to merge 230 commits into
matthartman:mainfrom
pyrex41:feat/moonshine-mlx-upstream
Closed

feat: add Moonshine MLX speech models#58
pyrex41 wants to merge 230 commits into
matthartman:mainfrom
pyrex41:feat/moonshine-mlx-upstream

Conversation

@pyrex41

@pyrex41 pyrex41 commented Apr 10, 2026

Copy link
Copy Markdown

Summary

Adds three Moonshine speech-to-text models via the moonshine-mlx pure-Swift Metal-accelerated backend.

  • Moonshine Tiny (43M params, ~170 MB) — fastest option
  • Moonshine Small (147M params, ~590 MB) — balanced speed/accuracy
  • Moonshine Medium (245M params, ~980 MB) — best Moonshine accuracy

Models appear in the Settings picker and download from HuggingFace on first use. Only available on Apple Silicon (#if arch(arm64)).

Moonshine is optimized for on-device inference and offers a different accuracy/speed tradeoff compared to Whisper — worth having as another option for users.

Changes

  • project.yml — add MoonshineMLX package dependency
  • SpeechModelCatalog.swift — add .moonshineMLX backend and three model descriptors
  • ModelManager.swift — add Moonshine model loading and transcription
  • README.md — document Moonshine models in speech model table

Test plan

  • Build succeeds
  • Moonshine models appear in Settings → Speech Model picker
  • Selecting a Moonshine model downloads from HuggingFace
  • Transcription works with Moonshine models

🤖 Generated with Claude Code

matthartman and others added 30 commits March 23, 2026 14:53
Models section shows each model with loaded/not loaded status.
Taller settings window (580px) to fit all sections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shows when any model isn't loaded. Downloads WhisperKit and/or
cleanup models directly from Settings — no need to re-run onboarding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LLM.swift's bundled llama.cpp doesn't support Qwen3/3.5 architecture.
Reverted to Qwen 2.5 1.5B + 3B which work reliably.
Will upgrade when LLM.swift updates its llama.cpp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Info.plist was still at v1.1 so Sparkle thought the "update"
was the same version. Now properly at v1.3 build 4.
Fixes #2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Improve post-paste learning via Accessibility
Picker in Settings under Input section. Switching models triggers
re-download and reload. Default remains small.en for accuracy.
tiny.en is ~75 MB and much faster for shorter recordings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…oggle)

Matches the original Ghost Pepper behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Input Monitoring prompt doesn't reliably show the system dialog
for debug-signed apps. Now attempts to start the hotkey monitor
even without it — Accessibility alone is sufficient for Control key.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixed codesign verification failure reported in #4.
Now verifies signature after extracting from DMG before release.

Also includes: speech model picker, default Control shortcuts,
non-blocking Input Monitoring check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
macOS kills the app after Screen Recording is granted but doesn't
relaunch it. Now spawns a background process that reopens the app
after 3 seconds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace no-audio modal with status pill
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses #5 — users should know about local transcript log
and auto-launch behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
File-based logging was replaced by in-memory DebugLogStore.
Nothing is written to disk. Updated disclosure accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No longer blocks Continue button. Shown as "(optional)" with
a bordered (not prominent) Enable button. Users can skip it
and enable later in Settings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Matches WhisperFlow and other dictation tools.
Toggle is Right Command + Right Option + Space.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Supports Spanish and 90+ other languages. Same size as small.en.
Users can tweak the cleanup prompt for their language's filler words.
Addresses #6.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
obra and others added 27 commits April 17, 2026 15:50
Reduce transcription latency and improve speaker tagging
…idebar, detect meeting name & attendees

- Inline summary prompt editor with model picker on Summary tab (click "Customize")
- "Imported from Granola" badge on imported meeting files
- Granola files show green import icon in sidebar, no speaker badges on transcript
- Resizable sidebar (drag divider, 160-400px range)
- Sidebar open by default when Meetings window opens
- "Detect" button grabs meeting name from Zoom/Teams window title + OCR attendees
- Auto-detect scans known meeting apps even for manual recordings
- Attendee OCR retries at 3s, 15s, 30s, 60s to catch late joiners
- Attendees accumulate across retries (no duplicates)
- Title auto-update only marks as done on success (retries on failure)
- "Zoom Meeting" cleaned from window titles

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rowser meetings

- Test dictation now uses selected mic (was always using system default)
- Settings mic picker initializes from saved selection, not system default
- Detect button activates meeting app (including browsers) before OCR
- Added browser bundle IDs (Brave, Chrome, Arc, Safari, Firefox) for Google Meet detection
- Falls back to frontmost non-Ghost Pepper app if no known meeting app found
- Attendee chips UI instead of plain text
- Debug logging for attendee OCR capture
- Cleaned "Zoom Meeting" from window titles

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- One-time migration: enables meetingTranscriptEnabled for existing users
  (detects update by checking if selectedCleanupModelKind exists but
  meetingTranscriptEnabled was never set)
- Shows "What's New" NSAlert on first launch after update with
  "Open Meetings" and "Got It" buttons
- Persists hasSeenMeetingTranscriptAnnouncement to only show once
- Fix: Settings mic picker initializes from saved selection, not system default
- Fix: test dictation targets selected mic device
- Fix: prioritize native Zoom over browsers for Detect button
- Fix: filter out ZM_ internal Zoom windows from title detection
- Fix: strip pronouns from attendee names (she/her, he/him, they/them)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- UpdaterController checks appcast XML 15s after launch to compare versions
- Menu bar dropdown shows 'Update Available — Install Now' in orange when update found
- Sparkle still handles the actual update dialog and installation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add three Moonshine speech-to-text models (Tiny, Small, Medium) via the
moonshine-mlx pure-Swift Metal-accelerated backend. Models appear in the
Settings picker and download from HuggingFace on first use.

- Moonshine Tiny (43M params, ~170 MB) — fastest
- Moonshine Small (147M params, ~590 MB) — balanced
- Moonshine Medium (245M params, ~980 MB) — best accuracy

Moonshine models are optimized for on-device inference and offer a
different accuracy/speed tradeoff compared to Whisper.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants