Skip to content

Return flat mono audio arrays#138

Draft
dewana-sl wants to merge 1 commit into
KittenML:mainfrom
dewana-sl:return-mono-audio-arrays
Draft

Return flat mono audio arrays#138
dewana-sl wants to merge 1 commit into
KittenML:mainfrom
dewana-sl:return-mono-audio-arrays

Conversation

@dewana-sl
Copy link
Copy Markdown

@dewana-sl dewana-sl commented May 21, 2026

Summary

Returns generated mono audio as a flat samples array instead of preserving a singleton channel dimension from the ONNX output.

Why

Direct audio pipelines commonly expect mono audio as (samples,). With the previous (1, samples) shape, user code such as np.stack([audio, audio], axis=1) produces an unexpected 3D array rather than (samples, channels), which can lead to incorrect playback behavior in downstream audio tools.

Addresses #112.

Validation

  • python3 -m unittest -q
  • Editable install in a fresh virtualenv with declared package dependencies.
  • Import smoke for kittentts, normalize_text, and mono_audio_array.
  • Real inference smoke with KittenML/kitten-tts-nano-0.8-int8, asserting the returned audio is one-dimensional float32 audio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants