Skip to content

Rebuild Dembow as an event-token Transformer (v2.0.0)#1

Merged
baezor merged 8 commits into
masterfrom
claude/music-generation-ml-revival-iw1ll3
Jun 17, 2026
Merged

Rebuild Dembow as an event-token Transformer (v2.0.0)#1
baezor merged 8 commits into
masterfrom
claude/music-generation-ml-revival-iw1ll3

Conversation

@baezor

@baezor baezor commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Dembow, rebuilt 🔥

The 2016 "first A.I. that generates reggaeton hits" was unrunnable (Python 2, TensorFlow 1.x, the dead python-midi) and, once revived, sounded like noise. This PR rebuilds the generator from the ground up around the modern recipe for symbolic music: a decoder-only Transformer over a REMI-style event language.

How it works now

Music is treated like language. Every song is tokenized into a stream of musical events:

BOS  BAR  POS_0  INST_drums DRUM_kick  DUR_1 VEL_5
              POS_0  INST_bass  PITCH_36   DUR_4 VEL_6
              POS_4  INST_drums DRUM_snare DUR_1 VEL_5  ...
     BAR  ...  EOS

Each note carries its instrument group (drums / bass / mid / high), pitch, duration, and velocity, so the model writes expressive, multi-instrument arrangements instead of a flat on/off grid. A small Transformer learns to predict the next event with masked self-attention, and generates autoregressively with temperature + nucleus (top-p) sampling.

What changed from the original

Then (2016) Now
Python 2, TensorFlow 1.x Python 3, PyTorch
Restricted Boltzmann Machine Decoder-only Transformer
Binary piano roll (on/off only) Event tokens: pitch + duration + velocity
All tracks flattened into one roll Multi-instrument (drums / bass / mid / high)
No sense of time Self-attention over the whole sequence
Trained on ~76 raw files Pitch-augmented corpus (~7×)
python-midi (Py2, dead) mido
Threw the weights away Saves & loads checkpoints
One-shot script A real CLI + installable package + CI

New layout

dembow/
  tokenizer.py   MIDI <-> event tokens (the REMI-style music language)
  model.py       the decoder-only Transformer (+ temperature / top-p sampling)
  data.py        corpus loading, pitch augmentation, windowing
  train.py       training loop + checkpointing
  generate.py    sample new songs and write MIDI
  cli.py         the `dembow` command
fire.py          one-shot entry point
tests/           a fast end-to-end smoke test
.github/workflows/ci.yml   runs the smoke tests on every PR

The legacy RBM / LSTM / groove / piano-roll modules are removed (preserved in git history).

Try it

pip install -r requirements.txt
dembow train       # -> dembow.pt
dembow generate    # -> generated/dembow_*.mid

Verified

  • ✅ Smoke tests pass (tokenizer round-trip, pitch augmentation, windowing, tiny train→generate→decode showing loss decreasing) — runs in CI on every push
  • ✅ Tokenizer round-trips MIDI to a 5-track arrangement and back
  • ✅ Generated token streams decode into valid multi-instrument MIDI

Honest note

The corpus is only ~76 short MIDI files, so even a Transformer is data-limited — it captures the feel (groove, instrumentation, key) more than polished songwriting. The biggest lever from here is more clean reggaeton MIDI in reggaeton_samples/.

🤖 Generated with Claude Code

https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8

claude added 5 commits June 17, 2026 04:43
The original RBM reggaeton generator could no longer run: it used Python 2
syntax, the removed TensorFlow 1.x graph API, and the unmaintained Python-2-only
python-midi library, and it never saved the weights it trained.

This brings it back to life while keeping its essence -- a Restricted Boltzmann
Machine that learns the dembow groove and Gibbs-samples new patterns:

- Reimplement the RBM in PyTorch (CD-k + Gibbs sampling), CPU/GPU capable,
  reproducible, with checkpoint save/load.
- Replace python-midi with mido for MIDI <-> piano-roll conversion; fix the
  glob that silently skipped uppercase .MID files in the corpus.
- Package it (dembow/) with a real CLI: `dembow train` / `dembow generate`,
  plus a nostalgic one-shot fire.py.
- Seed generation from real reggaeton grooves so output stays in the pocket.
- Add requirements.txt, pyproject.toml, a fast end-to-end smoke test, and a
  rewritten README documenting the revival.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
The revived RBM ran, but its output sounded like noise rather than reggaeton.
Two root causes: (1) all ~6-7 tracks of each song were flattened into one piano
roll, scrambling the dembow drums (channel 9, ~45% of corpus notes) in with bass
and melody; (2) the RBM models an unordered "bag of notes" with no sense of time
and sampled every pitch independently, yielding 300+ simultaneous notes.

Fix the representation for both engines and add a sequence model as the default:

- dembow/representation.py: separate drums into musical classes (kick, snare,
  hats, ...), transpose pitched content to a common key, and reconstruct a
  2-track (drums + pitched) MIDI on the way out.
- dembow/lstm.py: an LSTM that reads the song one 16th-note step at a time and
  predicts the next, so it learns the groove over time. Generation primes from a
  real song, keeps output sparse (top-k notes/step), and re-rolls if the beat
  drifts into silence.
- Keep the RBM as a "classic mode" (`--model rbm`); generation auto-detects the
  model type from the checkpoint.
- CLI: `dembow train --model lstm|rbm` plus generation knobs (num-steps,
  max-pitched, temperature). Default `dembow train` now trains the LSTM.
- Extend the smoke test to cover the representation round-trip and a tiny LSTM
  train+generate; rewrite the README to explain both engines, why the early
  output was noise, and how to push quality further.

Verified: LSTM training converges (loss 0.43 -> 0.18) and generation produces
musically dense output (~1-2.5 drums/step, ~3-5 pitched/step, matching real
songs) with consistent kick/snare/hat dembow grooves.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
The LSTM grooved most of the time but could drop the snare or drift off the
beat, and the dembow drum pattern is the signature of the genre. So pin it down:

- dembow/groove.py: derive the canonical one-bar drum pattern straight from the
  corpus (average drum onsets across every bar, keep the positions that fire
  often). The textbook dembow emerges -- kick on the downbeats, snare at steps
  3/6/11/14 ("boom-ch-boom-chick"), steady hats -- with a hardcoded fallback.
- lstm.generate: accept a drum_track and lock the drums to it, so the model only
  improvises bass/melody, conditioned on a rock-solid beat.
- generate: new --groove auto|dembow|none (default auto, from the corpus).
  Refactor the per-sample roll-out with a guard that re-rolls if either the beat
  (when not locked) or the melody drifts into silence, so every track has both.
- Add a GitHub Actions CI workflow running the smoke tests on PRs, and extend
  the suite to cover groove extraction and drum-locking. Update the README.

Verified: with the groove on, all generated samples carry the identical steady
dembow beat (kick/snare/hat every bar) plus a melody; 7/7 smoke tests pass under
pytest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
The smoke job failed with `ModuleNotFoundError: No module named 'dembow'`
because `pytest tests/` ran without the package installed or the repo root on
the path. Add pytest's pythonpath config so the local package is importable in
CI (and locally) without a separate install step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
Per the new direction -- improve the engine rather than preserve the original
RBM soul -- replace the entire generation stack with the modern recipe for
symbolic music: a decoder-only Transformer over a REMI-style event language.

- tokenizer.py: encode MIDI into event tokens (bar, position, instrument group,
  pitch, duration, velocity) and decode back to a multi-track MIDI. Captures
  per-note duration/velocity and a drums/bass/mid/high arrangement -- far richer
  than the old binary piano roll.
- model.py: a decoder-only Transformer (causal self-attention, weight-tied
  embeddings) with temperature + nucleus (top-p) / top-k sampling.
- data.py: corpus loader with pitch-shift augmentation (~7x the ~76-song corpus)
  and windowing for next-token training.
- train.py / generate.py / cli.py: rewritten around the Transformer; generation
  primes from a couple of real bars and samples a continuation.
- Remove the legacy RBM / LSTM / groove / piano-roll modules (kept in history).
- Rewrite the smoke tests (tokenizer round-trip, augmentation, windowing, tiny
  train+generate+decode); update README and pyproject to v2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
@baezor baezor changed the title Revive Dembow on a modern stack (Python 3 + PyTorch + mido) Rebuild Dembow as an event-token Transformer (v2.0.0) Jun 17, 2026
claude added 3 commits June 17, 2026 05:55
…data guide

Address several improvements for training quality and usability (v2.1.0):

- Validation + early stopping (train.py, data.py): hold out songs at the *song*
  level (so pitch-augmented copies don't leak), report val loss each epoch, save
  the checkpoint with the best val loss, and stop early when it plateaus. On a
  ~76-song corpus this is what separates generalizing from memorizing.
- Hardware presets (cli.py): `--preset cpu|gpu|auto` picks model size / epochs /
  augmentation sensibly (CPU is small + early-stops; GPU is bigger). Explicit
  flags still override. Fixes the default config timing out on CPU.
- Repetition control (model.py): `--repetition-penalty` gently down-weights
  recently used tokens and optional `--no-repeat-ngram` hard-bans exact repeats,
  so generation doesn't collapse into a degenerate loop -- while still allowing
  the musical repetition that makes a groove.
- reggaeton_samples/SOURCES.md: where to find more training MIDI (free libraries,
  open datasets, audio-to-MIDI), cleaning tips, and licensing notes.
- Extend smoke tests (split disjointness, repetition controls); update README.

Example outputs from a demo model are added in a follow-up commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
Three multi-track MIDI files (drums + bass + mid + high) so listeners can hear
Dembow without training first. Generated with repetition-penalty 1.2 from a small
demo model (val loss ~1.44) trained on the bundled corpus. See examples/README.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
Make Dembow enjoyable without any setup:

- Bundle a small pretrained checkpoint (dembow/assets/dembow-pretrained.pt) and
  fall back to it when no local checkpoint exists, so `dembow generate` works out
  of the box -- no training step. Included as package data; gitignore exception
  keeps it tracked despite the global *.pt ignore.
- Add render.py + `--render`: turn generated MIDI into .wav so you can actually
  hear it. Uses FluidSynth + a SoundFont when available, otherwise a tiny
  dependency-free NumPy synth (oscillators for pitched parts, shaped noise for
  drums) so rendering always works.
- Wire `--render` / `--soundfont` into the CLI; extend smoke tests (bundled model
  loads, builtin render produces audio); update README.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
@baezor baezor merged commit 862bd97 into master Jun 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants