-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
When running speak, the output does not confirm which model is being used. Since there are defaults, config files, and explicit flags, it is unclear what model is actually running.
Perspective: As an AI agent, I rely on defaults (fp16 for quality) per the skill documentation. But I have no confirmation that fp16 is actually being used vs config overrides or other defaults.
Current Output
speak v0.1.0
Generating audio for 6208 characters...
→ Starting TTS server...
✓ TTS server started
Streaming audio with adaptive buffering...
What model? No indication.
Desired Output
speak v0.1.0
Model: mlx-community/chatterbox-turbo-fp16 (16-bit, best quality)
Voice: ~/.chatter/voices/morgan_freeman3.wav
Temp: 0.7 | Speed: 1.0
Generating audio for 6208 characters...
→ Starting TTS server...
✓ TTS server started
Streaming audio with adaptive buffering...
Or more concise:
speak v0.1.0 | fp16 | morgan_freeman3.wav | temp:0.7 speed:1.0
Generating audio for 6208 characters...
Why This Matters
1. Verify defaults are working
Docs say "default is fp16" - but is it? No way to confirm without checking source code or config.
2. Debugging performance issues
If generation is slow, I need to know: Am I using 8bit? fp16? 4bit?
3. Confirming user intent
User says "use high quality" → I choose fp16 → Output should confirm this choice
4. Reproducibility
When reporting issues, need to know exact model used: "Issue occurred with chatterbox-turbo-fp16 at temp 0.7"
Proposed Solution
Minimal: Show model name at start
Model: mlx-community/chatterbox-turbo-fp16
Ideal: Show full context
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
speak v0.1.0
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Model: mlx-community/chatterbox-turbo-fp16
Voice: morgan_freeman3.wav
Params: temp=0.7 speed=1.0
Input: 6,208 chars (~400s estimated)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Verbose mode: Add --verbose for even more detail
speak file.txt --play --verbose
> Config loaded from: ~/.chatter/config.toml
> Model: mlx-community/chatterbox-turbo-fp16 (default)
> Voice: morgan_freeman3.wav (--voice flag)
> Temp: 0.7 (--temp flag)
> Speed: 1.0 (default)
> ...Impact
- Low-Medium priority
- Improves transparency and debuggability
- Helps confirm behavior matches expectations