Skip to content

lostmartian/audioTQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TurboQuant Audio Banner

License Version Python

A minimal, high-performance lossy audio compression engine built in Python and NumPy.

TQA adapts the mathematical principles of TurboQuant (originally designed for data-oblivious weight quantization of LLMs on GPU clusters) into a localized, cache-aligned CPU audio codec. By mapping high-dimensional audio amplitudes into zero-centered symmetrical Gaussian distributions via Hadamard rotations, it achieves a ~3.9x memory reduction while preserving transient fidelity.


Features

  • Data-Oblivious Energy Flattening: Spreads transient spike energy uniformly across audio blocks using Fast Walsh-Hadamard Transform (FWHT) rotations.
  • Optimal Centroid Clustering: Employs an iterative Lloyd-Max solver to converge on the Mean-Squared-Error (MSE) optimal 6-bit codebook.
  • 1-Bit QJL Residual Layer: Uses a Quantized Joint Least-Squares error sign layer to track quantization rounding errors and suppress distortion.
  • Zero-Dependency: Built purely on standard Python, NumPy, and SciPy.

Installation

Install the package directly from PyPI:

pip install audiotq

Quick Start (Python Library Usage)

You can easily integrate audiotq into your own Python audio processing pipelines:

import numpy as np
from audiotq import TurboAudioEngine

# 1. Initialize the codec engine
engine = TurboAudioEngine(block_size=512)

# 2. Prepare your floating-point audio signal (normalized between -1.0 and 1.0)
raw_signal = np.random.normal(0, 0.2, 8000).astype(np.float32)

# 3. Compress the signal
compressed_blocks, meta_scales = engine.compress_signal(raw_signal)

# 4. Decompress back to audio amplitudes
reconstructed_signal = engine.decompress_signal(compressed_blocks, meta_scales)

Command Line Interface (CLI)

The package installs global command-line entry points:

1. Compress and Decompress Audio

Process any standard .wav audio track end-to-end:

tqa-cli run -i input.wav -o output_reconstructed.wav

2. Compare Reconstructed Audio

Extract mathematical fidelity metrics (MSE, SQNR, correlation, and envelope preservation) between raw and processed signals:

tqa-cli compare -f1 input.wav -f2 output_reconstructed.wav

3. Run Synthetic Simulations

Generate custom synthetic signals (e.g., sine waves, square waves, noise, transients) with custom parameters:

tqa-sim --type square --frequency 440 --duration 2.0 --spikes 5

Performance Benchmarks

Below is the telemetry report captured using a standard high-sample dataset (44.1 kHz):

Metric Performance Profile
Original Dataset Size 2.52 MB (15.0 seconds)
Compressed On-Disk Footprint 0.65 MB
Compression Ratio ~3.91x smaller footprint (74.4% reduction)
Fidelity (SQNR) 30.24 dB
Compression Throughput ~1.32 MB/s
Decompression Throughput ~1.35 MB/s

Known Limitations & Failure Modes

1. WHT Basis Alignment (Extreme Sparsity Failure)

  • Failure Scenario: If an input block perfectly aligns with one of the Walsh-Hadamard basis vectors (e.g. signal = rotator.hadamardSigns), the rotated vector becomes a single extreme Kronecker delta spike.
  • Result: Because the Lloyd-Max codebook is optimized for normal distributions, it clips this extreme spike to the outermost centroid boundary ($\pm 2.41$ standard deviations). This clipping noise destroys reconstruction quality, dropping the SQNR to ~1.31 dB. (Proven in tests/test_failures.py::test_failure_hadamard_basis_alignment).

2. Silent Block Edge Case

  • Edge Case: Silent blocks have a standard deviation of 0.0. Dividing by this value during block standardization would lead to NaN or Inf errors.
  • Resolution: The engine implements a safety threshold guard (std_dev > 1e-6). Silent blocks bypass normalization and are reconstructed as perfect silence. (Proven in tests/test_failures.py::test_boundary_silent_signal).

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

About

Porting TurboQuant’s LLM weight-quantization mechanics (Hadamard rotations + Lloyd-Max centroids) to localized CPU audio processing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages