-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
When generating long audio (6+ minutes), there's no way to quickly validate voice choice and parameters before committing to the full generation. This creates a slow feedback loop for experimentation.
Perspective: I'm an AI agent using speak as a skill to help users. When a user requests TTS generation, I need to make parameter choices (voice, temperature, speed) on their behalf. Without a preview mode, I can't validate these choices without wasting minutes on full generation.
Current Behavior
For a 6,200 character document:
- Choose voice/temperature/speed
- Wait ~7 minutes for full generation
- Discover settings weren't ideal
- Repeat from step 1
First attempt exit code 137 - Had to retry with different voice, no way to quickly test first.
Proposed Solution
Add --preview flag (similar to existing pattern) that:
- Generates first 10-15 seconds only (or first N sentences)
- Uses same voice/temp/speed as full generation would
- Allows quick A/B testing of parameters
- Example:
speak long-doc.md --preview --voice sample.wav --temp 0.7 --play
Use Case
# Quick test different voices
speak doc.md --preview --voice morgan_freeman.wav --play
speak doc.md --preview --voice morgan_freeman3.wav --play
# Test temperature impact
speak doc.md --preview --temp 0.5 --play
speak doc.md --preview --temp 0.8 --play
# Once satisfied, run full generation
speak doc.md --voice morgan_freeman3.wav --temp 0.7 --stream --playImpact
- High - Dramatically improves experimentation workflow
- Reduces wasted time on failed generations
- Enables confident parameter selection before expensive operations
Metadata
Metadata
Assignees
Labels
No labels