Rebuild Dembow as an event-token Transformer (v2.0.0)#1
Merged
Conversation
The original RBM reggaeton generator could no longer run: it used Python 2 syntax, the removed TensorFlow 1.x graph API, and the unmaintained Python-2-only python-midi library, and it never saved the weights it trained. This brings it back to life while keeping its essence -- a Restricted Boltzmann Machine that learns the dembow groove and Gibbs-samples new patterns: - Reimplement the RBM in PyTorch (CD-k + Gibbs sampling), CPU/GPU capable, reproducible, with checkpoint save/load. - Replace python-midi with mido for MIDI <-> piano-roll conversion; fix the glob that silently skipped uppercase .MID files in the corpus. - Package it (dembow/) with a real CLI: `dembow train` / `dembow generate`, plus a nostalgic one-shot fire.py. - Seed generation from real reggaeton grooves so output stays in the pocket. - Add requirements.txt, pyproject.toml, a fast end-to-end smoke test, and a rewritten README documenting the revival. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
The revived RBM ran, but its output sounded like noise rather than reggaeton. Two root causes: (1) all ~6-7 tracks of each song were flattened into one piano roll, scrambling the dembow drums (channel 9, ~45% of corpus notes) in with bass and melody; (2) the RBM models an unordered "bag of notes" with no sense of time and sampled every pitch independently, yielding 300+ simultaneous notes. Fix the representation for both engines and add a sequence model as the default: - dembow/representation.py: separate drums into musical classes (kick, snare, hats, ...), transpose pitched content to a common key, and reconstruct a 2-track (drums + pitched) MIDI on the way out. - dembow/lstm.py: an LSTM that reads the song one 16th-note step at a time and predicts the next, so it learns the groove over time. Generation primes from a real song, keeps output sparse (top-k notes/step), and re-rolls if the beat drifts into silence. - Keep the RBM as a "classic mode" (`--model rbm`); generation auto-detects the model type from the checkpoint. - CLI: `dembow train --model lstm|rbm` plus generation knobs (num-steps, max-pitched, temperature). Default `dembow train` now trains the LSTM. - Extend the smoke test to cover the representation round-trip and a tiny LSTM train+generate; rewrite the README to explain both engines, why the early output was noise, and how to push quality further. Verified: LSTM training converges (loss 0.43 -> 0.18) and generation produces musically dense output (~1-2.5 drums/step, ~3-5 pitched/step, matching real songs) with consistent kick/snare/hat dembow grooves. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
The LSTM grooved most of the time but could drop the snare or drift off the
beat, and the dembow drum pattern is the signature of the genre. So pin it down:
- dembow/groove.py: derive the canonical one-bar drum pattern straight from the
corpus (average drum onsets across every bar, keep the positions that fire
often). The textbook dembow emerges -- kick on the downbeats, snare at steps
3/6/11/14 ("boom-ch-boom-chick"), steady hats -- with a hardcoded fallback.
- lstm.generate: accept a drum_track and lock the drums to it, so the model only
improvises bass/melody, conditioned on a rock-solid beat.
- generate: new --groove auto|dembow|none (default auto, from the corpus).
Refactor the per-sample roll-out with a guard that re-rolls if either the beat
(when not locked) or the melody drifts into silence, so every track has both.
- Add a GitHub Actions CI workflow running the smoke tests on PRs, and extend
the suite to cover groove extraction and drum-locking. Update the README.
Verified: with the groove on, all generated samples carry the identical steady
dembow beat (kick/snare/hat every bar) plus a melody; 7/7 smoke tests pass under
pytest.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
The smoke job failed with `ModuleNotFoundError: No module named 'dembow'` because `pytest tests/` ran without the package installed or the repo root on the path. Add pytest's pythonpath config so the local package is importable in CI (and locally) without a separate install step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
Per the new direction -- improve the engine rather than preserve the original RBM soul -- replace the entire generation stack with the modern recipe for symbolic music: a decoder-only Transformer over a REMI-style event language. - tokenizer.py: encode MIDI into event tokens (bar, position, instrument group, pitch, duration, velocity) and decode back to a multi-track MIDI. Captures per-note duration/velocity and a drums/bass/mid/high arrangement -- far richer than the old binary piano roll. - model.py: a decoder-only Transformer (causal self-attention, weight-tied embeddings) with temperature + nucleus (top-p) / top-k sampling. - data.py: corpus loader with pitch-shift augmentation (~7x the ~76-song corpus) and windowing for next-token training. - train.py / generate.py / cli.py: rewritten around the Transformer; generation primes from a couple of real bars and samples a continuation. - Remove the legacy RBM / LSTM / groove / piano-roll modules (kept in history). - Rewrite the smoke tests (tokenizer round-trip, augmentation, windowing, tiny train+generate+decode); update README and pyproject to v2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
…data guide Address several improvements for training quality and usability (v2.1.0): - Validation + early stopping (train.py, data.py): hold out songs at the *song* level (so pitch-augmented copies don't leak), report val loss each epoch, save the checkpoint with the best val loss, and stop early when it plateaus. On a ~76-song corpus this is what separates generalizing from memorizing. - Hardware presets (cli.py): `--preset cpu|gpu|auto` picks model size / epochs / augmentation sensibly (CPU is small + early-stops; GPU is bigger). Explicit flags still override. Fixes the default config timing out on CPU. - Repetition control (model.py): `--repetition-penalty` gently down-weights recently used tokens and optional `--no-repeat-ngram` hard-bans exact repeats, so generation doesn't collapse into a degenerate loop -- while still allowing the musical repetition that makes a groove. - reggaeton_samples/SOURCES.md: where to find more training MIDI (free libraries, open datasets, audio-to-MIDI), cleaning tips, and licensing notes. - Extend smoke tests (split disjointness, repetition controls); update README. Example outputs from a demo model are added in a follow-up commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
Three multi-track MIDI files (drums + bass + mid + high) so listeners can hear Dembow without training first. Generated with repetition-penalty 1.2 from a small demo model (val loss ~1.44) trained on the bundled corpus. See examples/README.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
Make Dembow enjoyable without any setup: - Bundle a small pretrained checkpoint (dembow/assets/dembow-pretrained.pt) and fall back to it when no local checkpoint exists, so `dembow generate` works out of the box -- no training step. Included as package data; gitignore exception keeps it tracked despite the global *.pt ignore. - Add render.py + `--render`: turn generated MIDI into .wav so you can actually hear it. Uses FluidSynth + a SoundFont when available, otherwise a tiny dependency-free NumPy synth (oscillators for pitched parts, shaped noise for drums) so rendering always works. - Wire `--render` / `--soundfont` into the CLI; extend smoke tests (bundled model loads, builtin render produces audio); update README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Dembow, rebuilt 🔥
The 2016 "first A.I. that generates reggaeton hits" was unrunnable (Python 2, TensorFlow 1.x, the dead
python-midi) and, once revived, sounded like noise. This PR rebuilds the generator from the ground up around the modern recipe for symbolic music: a decoder-only Transformer over a REMI-style event language.How it works now
Music is treated like language. Every song is tokenized into a stream of musical events:
Each note carries its instrument group (drums / bass / mid / high), pitch, duration, and velocity, so the model writes expressive, multi-instrument arrangements instead of a flat on/off grid. A small Transformer learns to predict the next event with masked self-attention, and generates autoregressively with temperature + nucleus (top-p) sampling.
What changed from the original
python-midi(Py2, dead)midoNew layout
The legacy RBM / LSTM / groove / piano-roll modules are removed (preserved in git history).
Try it
Verified
Honest note
The corpus is only ~76 short MIDI files, so even a Transformer is data-limited — it captures the feel (groove, instrumentation, key) more than polished songwriting. The biggest lever from here is more clean reggaeton MIDI in
reggaeton_samples/.🤖 Generated with Claude Code
https://claude.ai/code/session_01Ho9V6TBXozB23VjHqvXuT8