Attention treats one thing as physics: to look back over L tokens, pay O(L²).
Everyone optimizes on top of that compromise. Harmonic goes under it.
Harmonic stacks three recurrent SSM levels at progressively slower timescales
(τ ≈ 4, 32, 128). Each level receives the prediction error of the level below as
its input — fast level captures local syntax, slow level captures long-range structure.
The result is an O(L) recurrence with O(1) memory at inference whose advantage over
attention grows with sequence length instead of collapsing.
On enwiki8 at an equal token budget (28M params, 65.5M tokens), Harmonic beats a comparable Transformer and Mamba at every length — and the gap widens as context grows:
| seq | Harmonic | Mamba | Transformer | Harmonic vs TF |
|---|---|---|---|---|
| 1 024 | 6.571 | 6.616 | 6.662 | +1.4 % |
| 2 048 | 6.426 | 6.532 | 6.657 | +3.5 % |
| 4 096 | 6.687 | 6.740 | 7.045 | +5.1 % |
| 8 192 | 6.333 | 6.422 | 6.787 | +6.7 % |
| 16 384 | 6.196 | 6.286 | 6.873 | +9.9 % |
| 32 768 | 6.433 | 6.549 | 7.259 | +11.4 % |
| 65 536 | 6.169 | OOM | OOM | — |
bpt, lower is better. At 64K tokens both baselines run out of memory on an 80 GB H100;
Harmonic trains successfully — a direct consequence of O(L) memory.
Statistical robustness (seq 8192, 5 seeds): Harmonic 6.515 ± 0.163 · Mamba 6.575 ± 0.155 · Transformer 7.009 ± 0.159. Confidence intervals do not overlap; the ordering Harmonic < Mamba < Transformer holds across all five seeds.
Replication — WikiText-103, the standard benchmark used by Mamba and S4: Harmonic wins
at every length, gap +1.7 % → +7.2 % across 1K–32K.
- Hierarchy is load-bearing. Flatten the timescales (all τ equal) → +0.501 bpt worse.
- Prediction-error routing is mostly free. Passing raw states instead of prediction
errors changes loss by
≤ 0.022 bpt— the gain is the timescale hierarchy, stated honestly.
Replace all attention layers in TinyLlama 1.1B with HarmonicBlock. The RoPE positional
limit disappears: Hallamonic holds stable loss across 1K–8K on two independent clean
benchmarks (Lambada, fineweb-edu held-out), while TinyLlama degrades catastrophically
past its 2K-token RoPE limit (+9.4 bpt at seq 8K on Lambada).
HarmonicLevel— selective SSM with a data-dependent decay gateA(t), initialized to a target timescale τ via a bias trick.h[t] = A[t]·h[t-1] + b[t].- Three levels, τ ≈ 4 / 32 / 128 (fast syntax / phrase / long-range).
- Inter-level signal — each level receives
h_lower − pred(h_upper), a prediction error. - Output —
h1 + h2 + h3→ LM head. - Stateful inference — raw pre-LayerNorm state carried across chunks →
O(1)memory for documents of any length. A Transformer cannot do this. - Triton scan kernel — parallel SSM scan, ~85× over the naïve loop on H100; pure-PyTorch Hillis–Steele fallback for CPU / no-Triton.
harmonic_ssm/
architecture/ core models + training (train_fast.py = all models + stateful)
experimental/ scaling / generation / baselines
hallamonic/ 1B scale: HarmonicBlock replacing LlamaAttention, Modal launchers
harmonic_snn/ spiking-neural-network variant (matches SSM bpt)
paper/ harmonic.tex, harmonic.pdf, bib, figures
logs/ H100 result logs behind the tables
results/SUMMARY.md full results writeup
figures/ figures for this README
python -m venv venv && source venv/bin/activate
pip install torch # + triton (optional, for the fast GPU scan)
# train + compare Harmonic / Mamba / Transformer (CPU-friendly defaults)
python harmonic_ssm/architecture/train_fast.py --help
python harmonic_ssm/architecture/train_fast.pytrain_fast.py contains every model used in the paper (MinHarmonicFair, the Flat /
NoPred ablations, MinMambaFair, MinTransformerFlash) and the stateful variants, behind
one parallel-scan core. Modal launchers in architecture/ reproduce the H100 crossover,
scaling, and stateful experiments end to end.
If Harmonic helps your work, please cite it. See CITATION.cff.
@software{harmonic2026,
title = {Harmonic: Hierarchical State Space Models},
author = {Omibranch and {Harmonic Labs}},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.20381713},
url = {https://github.com/Omibranch/Harmonic}
}Concept DOI (always resolves to the latest version):
10.5281/zenodo.20381713· this version:10.5281/zenodo.20412249. Full training logs: https://github.com/Omibranch/harmonic-logs.
Harmonic is dual-licensed:
- Free for noncommercial use — research, study, teaching, evaluation, and any other noncommercial purpose — under the PolyForm Noncommercial License 1.0.0. Cite us and build away.
- Commercial use requires a paid commercial license. Shipping Harmonic (or a derivative)
in a product or any revenue-generating context? See
LICENSE-COMMERCIAL.md.
Questions, commercial inquiries, collaboration: contact@harmoniclabs.cc · harmoniclabs.cc





