Skip to content

Omibranch/Harmonic

Repository files navigation

Harmonic

Harmonic: Hierarchical State Space Models

An O(L) recurrence that gets better as context grows — not worse.

License: PolyForm NC 1.0.0 Commercial license DOI


The wall we broke

Attention treats one thing as physics: to look back over L tokens, pay O(L²). Everyone optimizes on top of that compromise. Harmonic goes under it.

Harmonic stacks three recurrent SSM levels at progressively slower timescales (τ ≈ 4, 32, 128). Each level receives the prediction error of the level below as its input — fast level captures local syntax, slow level captures long-range structure. The result is an O(L) recurrence with O(1) memory at inference whose advantage over attention grows with sequence length instead of collapsing.

Headline result

On enwiki8 at an equal token budget (28M params, 65.5M tokens), Harmonic beats a comparable Transformer and Mamba at every length — and the gap widens as context grows:

seq Harmonic Mamba Transformer Harmonic vs TF
1 024 6.571 6.616 6.662 +1.4 %
2 048 6.426 6.532 6.657 +3.5 %
4 096 6.687 6.740 7.045 +5.1 %
8 192 6.333 6.422 6.787 +6.7 %
16 384 6.196 6.286 6.873 +9.9 %
32 768 6.433 6.549 7.259 +11.4 %
65 536 6.169 OOM OOM

bpt, lower is better. At 64K tokens both baselines run out of memory on an 80 GB H100; Harmonic trains successfully — a direct consequence of O(L) memory.

Crossover O(L) vs O(L^2)

Statistical robustness (seq 8192, 5 seeds): Harmonic 6.515 ± 0.163 · Mamba 6.575 ± 0.155 · Transformer 7.009 ± 0.159. Confidence intervals do not overlap; the ordering Harmonic < Mamba < Transformer holds across all five seeds.

Replication — WikiText-103, the standard benchmark used by Mamba and S4: Harmonic wins at every length, gap +1.7 %+7.2 % across 1K–32K.

Why it works (ablations)

Ablation
  • Hierarchy is load-bearing. Flatten the timescales (all τ equal) → +0.501 bpt worse.
  • Prediction-error routing is mostly free. Passing raw states instead of prediction errors changes loss by ≤ 0.022 bpt — the gain is the timescale hierarchy, stated honestly.

Scaling to 1B — Hallamonic

Replace all attention layers in TinyLlama 1.1B with HarmonicBlock. The RoPE positional limit disappears: Hallamonic holds stable loss across 1K–8K on two independent clean benchmarks (Lambada, fineweb-edu held-out), while TinyLlama degrades catastrophically past its 2K-token RoPE limit (+9.4 bpt at seq 8K on Lambada).

Hallamonic 1B Hallamonic advantage

Architecture

  • HarmonicLevel — selective SSM with a data-dependent decay gate A(t), initialized to a target timescale τ via a bias trick. h[t] = A[t]·h[t-1] + b[t].
  • Three levels, τ ≈ 4 / 32 / 128 (fast syntax / phrase / long-range).
  • Inter-level signal — each level receives h_lower − pred(h_upper), a prediction error.
  • Outputh1 + h2 + h3 → LM head.
  • Stateful inference — raw pre-LayerNorm state carried across chunks → O(1) memory for documents of any length. A Transformer cannot do this.
  • Triton scan kernel — parallel SSM scan, ~85× over the naïve loop on H100; pure-PyTorch Hillis–Steele fallback for CPU / no-Triton.

Repository layout

harmonic_ssm/
  architecture/      core models + training (train_fast.py = all models + stateful)
  experimental/      scaling / generation / baselines
hallamonic/          1B scale: HarmonicBlock replacing LlamaAttention, Modal launchers
harmonic_snn/        spiking-neural-network variant (matches SSM bpt)
paper/               harmonic.tex, harmonic.pdf, bib, figures
logs/                H100 result logs behind the tables
results/SUMMARY.md   full results writeup
figures/             figures for this README

Quickstart

python -m venv venv && source venv/bin/activate
pip install torch        # + triton (optional, for the fast GPU scan)

# train + compare Harmonic / Mamba / Transformer (CPU-friendly defaults)
python harmonic_ssm/architecture/train_fast.py --help
python harmonic_ssm/architecture/train_fast.py

train_fast.py contains every model used in the paper (MinHarmonicFair, the Flat / NoPred ablations, MinMambaFair, MinTransformerFlash) and the stateful variants, behind one parallel-scan core. Modal launchers in architecture/ reproduce the H100 crossover, scaling, and stateful experiments end to end.

Cite

If Harmonic helps your work, please cite it. See CITATION.cff.

@software{harmonic2026,
  title     = {Harmonic: Hierarchical State Space Models},
  author    = {Omibranch and {Harmonic Labs}},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.20381713},
  url       = {https://github.com/Omibranch/Harmonic}
}

Concept DOI (always resolves to the latest version): 10.5281/zenodo.20381713 · this version: 10.5281/zenodo.20412249. Full training logs: https://github.com/Omibranch/harmonic-logs.

License

Harmonic is dual-licensed:

  • Free for noncommercial use — research, study, teaching, evaluation, and any other noncommercial purpose — under the PolyForm Noncommercial License 1.0.0. Cite us and build away.
  • Commercial use requires a paid commercial license. Shipping Harmonic (or a derivative) in a product or any revenue-generating context? See LICENSE-COMMERCIAL.md.

Questions, commercial inquiries, collaboration: contact@harmoniclabs.cc · harmoniclabs.cc

Harmonic Labs — we break walls. The public record is the proof we were here first.

About

Harmonic: Hierarchical State Space Models — an O(L) recurrence that gets better as context grows. Dual-licensed (PolyForm Noncommercial + commercial).

Topics

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE-COMMERCIAL.md

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors