fix: resample TTS audio to device native rate to eliminate crackling by Ded1nk · Pull Request #204 · dnhkng/GLaDOS

Ded1nk · 2026-05-27T06:58:25Z

Problem

The TTS model (glados.onnx) outputs audio at 22050 Hz, but most audio devices default to 44100 Hz (or another native rate). When sounddevice opens a stream at 22050 Hz on a 44100 Hz device, PortAudio performs internal sample rate conversion — and its built-in SRC is low quality, producing audible crackling and distortion artifacts.

Fix

Use soxr to resample TTS audio to the output device's native sample rate before playback, bypassing PortAudio's SRC entirely:

device_rate = int(sd.query_devices(kind="output")["default_samplerate"])
if sample_rate != device_rate:
    audio = soxr.resample(audio, sample_rate, device_rate, quality="HQ")
    sample_rate = device_rate

soxr is a well-established, high-quality SRC library (used in FFmpeg, librosa, etc.) and is available on all platforms.

Secondary fix: dual-stream monitoring bug

The original measure_percentage_spoken() called sd.play() to start audio and then opened a second concurrent sd.OutputStream with a callback that filled output with zeros — just to count frames for progress tracking. On macOS CoreAudio (and likely other backends), having two concurrent output streams caused audio mixing artifacts and scheduling jitter.

Fixed by: using a single callback-driven sd.OutputStream that both feeds audio data AND increments _playback_position. measure_percentage_spoken() now simply polls a threading.Event set when the stream finishes.

Testing

Verified on macOS with headphones defaulting to 44100 Hz. Audio went from clearly distorted/crackling to crystal clear after this change.

Dependencies

Added soxr>=0.3.0 to pyproject.toml dependencies.

Summary by CodeRabbit

Chores
- Added audio processing library to project dependencies
Refactor
- Enhanced audio playback with automatic sample rate conversion to match system output device specifications
- Improved playback state tracking and completion detection for more reliable audio behavior

TTS outputs at 22050Hz but most audio devices run at 44100Hz (or another native rate). PortAudio's built-in sample rate conversion is low quality and causes audible crackling/distortion artifacts. Fix: use soxr (high-quality SRC library) to resample audio to the output device's native sample rate before playback. Also fixes a secondary bug where measure_percentage_spoken() was opening a second concurrent OutputStream filled with zeros alongside the sd.play() call. Having two concurrent output streams caused audio mixing artifacts and scheduling jitter. Replaced with a single callback-driven OutputStream that both plays audio and tracks playback position accurately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai · 2026-05-27T06:58:38Z

📝 Walkthrough

Walkthrough

This PR reworks the audio playback system in SoundDeviceAudioIO to use soxr for high-quality resampling and replaces simple sd.play() calls with a custom sd.OutputStream callback and explicit state-based completion tracking.

Changes

Audio playback system refactoring

Layer / File(s)	Summary
Dependency setup for resampling `pyproject.toml`, `src/glados/audio_io/sounddevice_io.py`	`soxr>=0.3.0` is added as a runtime dependency and imported for high-quality audio resampling.
Playback state infrastructure `src/glados/audio_io/sounddevice_io.py`	New state fields track the active output stream (`_output_stream`), buffered audio data (`_playback_audio`), playback position (`_playback_position`), and completion signal (`_playback_done`).
Audio processing and stream initialization `src/glados/audio_io/sounddevice_io.py`	`start_speaking` now converts input to mono `float32`, conditionally resamples to the device's native rate using `soxr` (HQ quality), buffers the audio with position tracking, and initializes an `sd.OutputStream` with a callback that writes audio chunks and signals completion.
Playback progress and completion monitoring `src/glados/audio_io/sounddevice_io.py`	`measure_percentage_spoken` switches from a callback-based progress counter to polling `_is_playing` and `_playback_done`, then computes percentage from the final buffered playback position.
Playback stream termination `src/glados/audio_io/sounddevice_io.py`	`stop_speaking` explicitly stops and closes the active `_output_stream` and sets completion flags, replacing the prior `sd.stop()`-based interruption.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 A stream of audio doth flow so true,
With soxr's magic, resampled anew,
No more callbacks left hanging about,
Just buffers and states, completion rings out! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title clearly and specifically describes the main change: resampling TTS audio to match device native sample rate to fix crackling issues.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/glados/audio_io/sounddevice_io.py`:
- Around line 159-160: _playback_position is tracked in device (resampled)
frames while total_samples is from the original TTS buffer, so progress must be
scaled by the sample-rate ratio; update the progress calculation wherever you
compute percent/completion (references: _playback_position, _playback_audio,
_playback_sample_rate, and total_samples) to multiply or divide by
(_playback_sample_rate / original_sample_rate) (or equivalent inverse) so the
units match before computing percentage, and apply the same change in the other
occurrence noted around the block that handles interruptions (the second spot
mentioned near lines 224-226).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 88dc14cc-e4a3-4e19-b3e4-ff6a32d1bfc9

📥 Commits

Reviewing files that changed from the base of the PR and between 878b144 and 187c272.

📒 Files selected for processing (2)

pyproject.toml
src/glados/audio_io/sounddevice_io.py

coderabbitai · 2026-05-27T07:04:38Z

+        self._playback_audio = audio
+        self._playback_sample_rate = sample_rate


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scale playback progress by sample-rate ratio.

_playback_position is counted in resampled device frames, but total_samples still comes from the original TTS buffer. When the rates differ, interruptions report the wrong percentage back to src/glados/core/speech_player.py—e.g. 22.05 kHz → 44.1 kHz will read roughly 100% at half the utterance.

💡 Proposed fix

progress = getattr(self, "_playback_position", total_samples) self._is_playing = False - percentage_played = min(int(progress / total_samples * 100), 100) if total_samples > 0 else 100 + if total_samples > 0 and sample_rate and self._playback_sample_rate > 0: + played_seconds = progress / self._playback_sample_rate + total_seconds = total_samples / sample_rate + percentage_played = min(int(played_seconds / total_seconds * 100), 100) + else: + percentage_played = 100 return interrupted, percentage_played

Also applies to: 224-226

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/glados/audio_io/sounddevice_io.py` around lines 159 - 160, _playback_position is tracked in device (resampled) frames while total_samples is from the original TTS buffer, so progress must be scaled by the sample-rate ratio; update the progress calculation wherever you compute percent/completion (references: _playback_position, _playback_audio, _playback_sample_rate, and total_samples) to multiply or divide by (_playback_sample_rate / original_sample_rate) (or equivalent inverse) so the units match before computing percentage, and apply the same change in the other occurrence noted around the block that handles interruptions (the second spot mentioned near lines 224-226).

make the fix. Also, can we avoid the new soxr dependency?

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash cat pyproject.toml

Length of output: 5630

Happy to make both changes. Here's the plan:

Remove soxr>=0.3.0 from pyproject.toml

Drop the import soxr and replace soxr.resample(...) with a private _resample helper using np.interp (already available — no new dependency)

Fix measure_percentage_spoken to convert both progress and total_samples to seconds before computing percentage

Note on resampling quality: np.interp (linear interpolation) is simpler than soxr's HQ sinc filter. For typical TTS use-cases — especially integer upsampling ratios like 22 050 → 44 100 Hz — it is audibly transparent. If higher quality becomes important later, scipy.signal.resample_poly (scipy is a common optional dep) can be swapped in with a one-line change.

🧬 Code Graph Analysis Results

src/glados/audio_io/sounddevice_io.py

Lines 5-12 (relevant to avoiding the new soxr dependency):

import soxr from loguru import logger import numpy as np from numpy.typing import NDArray import sounddevice as sd # type: ignore from . import VAD

Lines ~104-137 (resampling uses soxr, likely what the user wants to remove):

# Resample to device native rate if needed to avoid low-quality SRC in PortAudio device_rate = int(sd.query_devices(kind="output")["default_samplerate"]) if sample_rate != device_rate: audio = soxr.resample(audio, sample_rate, device_rate, quality="HQ") sample_rate = device_rate

Lines ~60-84 (VAD chunk sizing; relevant if the “fix” relates to VAD compatibility):

self.input_stream = sd.InputStream( samplerate=self.SAMPLE_RATE, channels=1, callback=audio_callback, blocksize=int(self.SAMPLE_RATE * self.VAD_SIZE / 1000), )

src/glados/audio_io/vad.py

Lines ~41-88 (VAD constraints that must match microphone chunk sizing / sample rate):

def __call__(self, audio_sample: NDArray[np.float32], sample_rate: int = SAMPLE_RATE) -> NDArray[np.float32]: """Process a batch of audio samples and return the VAD output.""" if sample_rate != 16000: raise ValueError("silero_vad_16k_op15.onnx only supports 16000 Hz audio.") num_samples = 512 if audio_sample.shape[-1] != num_samples: raise ValueError( f"Provided number of samples is {audio_sample.shape[-1]} " f"(Supported values: 256 for 8000 sample rate, 512 for 16000)" )

Lines ~88-116 (context/stateful inference details):

batch_size = audio_sample.shape[0] context_size = 64 if sample_rate == 16000 else 32 if not self._last_batch_size: self.reset_states(batch_size) if (self._last_sr) and (self._last_sr != sample_rate): self.reset_states(batch_size) if (self._last_batch_size) and (self._last_batch_size != batch_size): self.reset_states(batch_size) if not len(self._context): self._context = np.zeros((batch_size, context_size), dtype=np.float32) audio_sample = np.concatenate([self._context, audio_sample], axis=1) ort_inputs = { "input": audio_sample.astype(np.float32), "state": self._state, "sr": np.array(sample_rate, dtype=np.int64), } ort_outs = self.ort_sess.run(None, ort_inputs) out, state = ort_outs

src/glados/audio_io/__init__.py

Lines 26-34 (method signatures for reference while adjusting behavior):

def start_listening(self) -> None: ... def stop_listening(self) -> None: ... def start_speaking( self, audio_data: NDArray[np.float32], sample_rate: int | None = None, text: str = "" ) -> None: ... def measure_percentage_spoken(self, total_samples: int, sample_rate: int | None = None) -> tuple[bool, int]: ... def check_if_speaking(self) -> bool: ... def stop_speaking(self) -> None: ... def get_sample_queue(self) -> queue.Queue[tuple[NDArray[np.float32], bool]]: ...

Failed to handle agent chat message. Please try again.

dnhkng · 2026-05-31T19:21:25Z

@Ded1nk Thanks for the PR.

Can you modify this to not add a new dependency? A strong goal of the project is minimal dependencies, and this adds soxr.

Ded1nk · 2026-05-31T19:29:55Z

indeed it does. this will be my first ever commit to a public project, lets see if i can do it. big thanks for helping me on my journey helping you. :)

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resample TTS audio to device native rate to eliminate crackling#204

fix: resample TTS audio to device native rate to eliminate crackling#204
Ded1nk wants to merge 1 commit into
dnhkng:mainfrom
Ded1nk:fix/audio-crackle-samplerate

Ded1nk commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 27, 2026 •

edited

Loading

Uh oh!

dnhkng May 31, 2026

Uh oh!

coderabbitai Bot May 31, 2026

Uh oh!

dnhkng commented May 31, 2026

Uh oh!

Ded1nk commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		self._playback_audio = audio
		self._playback_sample_rate = sample_rate

Uh oh!

Conversation

Ded1nk commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Secondary fix: dual-stream monitoring bug

Testing

Dependencies

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhkng May 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

src/glados/audio_io/sounddevice_io.py

src/glados/audio_io/vad.py

src/glados/audio_io/__init__.py

Uh oh!

dnhkng commented May 31, 2026

Uh oh!

Ded1nk commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ded1nk commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading

coderabbitai Bot May 27, 2026 •

edited

Loading

`src/glados/audio_io/sounddevice_io.py`

`src/glados/audio_io/vad.py`

`src/glados/audio_io/init.py`