Harmonic: Hierarchical State Space Models

An O(L) recurrence that gets better as context grows — not worse.

The wall we broke

Attention treats one thing as physics: to look back over L tokens, pay O(L²). Everyone optimizes on top of that compromise. Harmonic goes under it.

Harmonic stacks three recurrent SSM levels at progressively slower timescales (τ ≈ 4, 32, 128). Each level receives the prediction error of the level below as its input — fast level captures local syntax, slow level captures long-range structure. The result is an O(L) recurrence with O(1) memory at inference whose advantage over attention grows with sequence length instead of collapsing.

Headline result

On enwiki8 at an equal token budget (28M params, 65.5M tokens), Harmonic beats a comparable Transformer and Mamba at every length — and the gap widens as context grows:

seq	Harmonic	Mamba	Transformer	Harmonic vs TF
1 024	6.571	6.616	6.662	+1.4 %
2 048	6.426	6.532	6.657	+3.5 %
4 096	6.687	6.740	7.045	+5.1 %
8 192	6.333	6.422	6.787	+6.7 %
16 384	6.196	6.286	6.873	+9.9 %
32 768	6.433	6.549	7.259	+11.4 %
65 536	6.169	OOM	OOM	—

bpt, lower is better. At 64K tokens both baselines run out of memory on an 80 GB H100; Harmonic trains successfully — a direct consequence of O(L) memory.

Statistical robustness (seq 8192, 5 seeds): Harmonic 6.515 ± 0.163 · Mamba 6.575 ± 0.155 · Transformer 7.009 ± 0.159. Confidence intervals do not overlap; the ordering Harmonic < Mamba < Transformer holds across all five seeds.

Replication — WikiText-103, the standard benchmark used by Mamba and S4: Harmonic wins at every length, gap +1.7 % → +7.2 % across 1K–32K.

Why it works (ablations)

Hierarchy is load-bearing. Flatten the timescales (all τ equal) → +0.501 bpt worse.
Prediction-error routing is mostly free. Passing raw states instead of prediction errors changes loss by ≤ 0.022 bpt — the gain is the timescale hierarchy, stated honestly.

Scaling to 1B — Hallamonic

Replace all attention layers in TinyLlama 1.1B with HarmonicBlock. The RoPE positional limit disappears: Hallamonic holds stable loss across 1K–8K on two independent clean benchmarks (Lambada, fineweb-edu held-out), while TinyLlama degrades catastrophically past its 2K-token RoPE limit (+9.4 bpt at seq 8K on Lambada).

Architecture

HarmonicLevel — selective SSM with a data-dependent decay gate A(t), initialized to a target timescale τ via a bias trick. h[t] = A[t]·h[t-1] + b[t].
Three levels, τ ≈ 4 / 32 / 128 (fast syntax / phrase / long-range).
Inter-level signal — each level receives h_lower − pred(h_upper), a prediction error.
Output — h1 + h2 + h3 → LM head.
Stateful inference — raw pre-LayerNorm state carried across chunks → O(1) memory for documents of any length. A Transformer cannot do this.
Triton scan kernel — parallel SSM scan, ~85× over the naïve loop on H100; pure-PyTorch Hillis–Steele fallback for CPU / no-Triton.

Repository layout

harmonic_ssm/
  architecture/      core models + training (train_fast.py = all models + stateful)
  experimental/      scaling / generation / baselines
hallamonic/          1B scale: HarmonicBlock replacing LlamaAttention, Modal launchers
harmonic_snn/        spiking-neural-network variant (matches SSM bpt)
paper/               harmonic.tex, harmonic.pdf, bib, figures
logs/                H100 result logs behind the tables
results/SUMMARY.md   full results writeup
figures/             figures for this README

Quickstart

python -m venv venv && source venv/bin/activate
pip install torch        # + triton (optional, for the fast GPU scan)

# train + compare Harmonic / Mamba / Transformer (CPU-friendly defaults)
python harmonic_ssm/architecture/train_fast.py --help
python harmonic_ssm/architecture/train_fast.py

train_fast.py contains every model used in the paper (MinHarmonicFair, the Flat / NoPred ablations, MinMambaFair, MinTransformerFlash) and the stateful variants, behind one parallel-scan core. Modal launchers in architecture/ reproduce the H100 crossover, scaling, and stateful experiments end to end.

Cite

If Harmonic helps your work, please cite it. See CITATION.cff.

@software{harmonic2026,
  title     = {Harmonic: Hierarchical State Space Models},
  author    = {Omibranch and {Harmonic Labs}},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.20381713},
  url       = {https://github.com/Omibranch/Harmonic}
}

Concept DOI (always resolves to the latest version): 10.5281/zenodo.20381713 · this version: 10.5281/zenodo.20412249. Full training logs: https://github.com/Omibranch/harmonic-logs.

License

Harmonic is dual-licensed:

Free for noncommercial use — research, study, teaching, evaluation, and any other noncommercial purpose — under the PolyForm Noncommercial License 1.0.0. Cite us and build away.
Commercial use requires a paid commercial license. Shipping Harmonic (or a derivative) in a product or any revenue-generating context? See LICENSE-COMMERCIAL.md.

Questions, commercial inquiries, collaboration: contact@harmoniclabs.cc · harmoniclabs.cc

_{Harmonic Labs — we break walls. The public record is the proof we were here first.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harmonic: Hierarchical State Space Models

The wall we broke

Headline result

Why it works (ablations)

Scaling to 1B — Hallamonic

Architecture

Repository layout

Quickstart

Cite

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
figures		figures
hallamonic		hallamonic
harmonic_snn		harmonic_snn
harmonic_ssm		harmonic_ssm
logs		logs
paper		paper
results		results
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
LICENSE-COMMERCIAL.md		LICENSE-COMMERCIAL.md
NOTICE.md		NOTICE.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Harmonic: Hierarchical State Space Models

The wall we broke

Headline result

Why it works (ablations)

Scaling to 1B — Hallamonic

Architecture

Repository layout

Quickstart

Cite

License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages