Speaker State Trajectory analysis via a Karpathy-style autoresearch loop.
Treats a speaker's voice as a nonlinear dynamical system — extracts acoustic features per frame, embeds them into phase space (Takens' theorem), and analyzes the resulting trajectory for attractors, bifurcations, regime changes, and Lyapunov exponents. An LLM drives the research cycle: hypothesize → design experiment → execute → evaluate → reflect → loop.
# Install
cd sst_autoresearch
pip install -e .
# Configure model in .env (see .env for options)
# Smoke test
python smoke_test.py
# Run autoresearch (2 iterations to start)
python -m src.graph data/wav/your_file.wav --max-iter=2START → ingest_audio → hypothesize → design_experiment → run_experiment → evaluate → reflect
reflect → hypothesize (pivot — try new direction)
reflect → design_experiment (deepen — more evidence)
reflect → synthesize → END (conclude — write report)
All DSP/statistics run as deterministic Python. The LLM handles hypothesis generation, experiment design, result interpretation, and meta-reasoning about what to explore next.
Per-frame features (25ms frames, 10ms hop) via Parselmouth (Praat) and librosa: F0, jitter, shimmer, HNR, formants F1-F3, MFCC 1-13, spectral centroid, rolloff, flux, RMS energy, zero crossing rate.
Recurrence plots + RQA, Lyapunov exponents (Rosenstein), Grassberger-Procaccia correlation dimension, sample entropy, regime change detection.
Time-delay embeddings (Takens' theorem) with automatic optimal τ (AMI) and embedding dimension selection.
Configured via .env:
| Backend | Setting | Hardware | Model |
|---|---|---|---|
| MLX | SST_BACKEND=mlx |
Mac Studio M3 Ultra | mlx-community/Qwen3.5-122B-A10B-bf16 |
| Ollama | SST_BACKEND=ollama |
NVIDIA DGX | qwen3.5:122b |
sst_autoresearch/
├── .env # Backend + model config
├── smoke_test.py # Layer-by-layer validation
├── pyproject.toml
├── src/
│ ├── graph.py # LangGraph state machine
│ ├── llm.py # LLM interface (MLX + Ollama)
│ ├── nodes/ # Autoresearch loop nodes
│ │ ├── hypothesize.py
│ │ ├── design.py
│ │ ├── execute.py
│ │ ├── evaluate.py
│ │ ├── reflect.py
│ │ └── synthesize.py
│ ├── features/ # DSP pipeline
│ │ ├── acoustic.py # Parselmouth + librosa extraction
│ │ └── dynamics.py # RQA, Lyapunov, entropy, regimes
│ └── prompts/ # LLM prompt templates
├── data/wav/ # Input audio files
├── outputs/reports/ # Generated research reports
└── notebooks/
- Python ≥ 3.11
- For MLX backend: Apple Silicon Mac with
mlx-lm - For Ollama backend: Ollama server with a Qwen3.5 model pulled
- Audio dependencies:
librosa,parselmouth,scipy