Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed .DS_Store
Binary file not shown.
26 changes: 26 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: CI

on:
push:
branches: [master]
pull_request:

jobs:
smoke-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install numpy mido pytest
pip install torch --index-url https://download.pytorch.org/whl/cpu

- name: Run smoke tests
run: pytest tests/ -v
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -99,3 +99,12 @@ ENV/

# mypy
.mypy_cache/

# macOS
.DS_Store

# Dembow artifacts
*.pt
generated/
# ...but the bundled pretrained model ships with the package
!dembow/assets/*.pt
174 changes: 144 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,177 @@
# Dembow
## The first A.I that generates reggaeton hits.
## The first A.I. that generates reggaeton hits. 🔥

![Denbow.jpg](denbow.jpg)

## Machine Learning Techniques
Using TensorFlow to generate short sequences of music with a [Restricted Boltzmann Machine](http://deeplearning4j.org/restrictedboltzmannmachine.html).
Do you want to go deep?, see the original technical idea: [How to build an RBM neural network in tensorflow](http://danshiebler.com/2016-08-10-musical-tensorflow-part-one-the-rbm/).
Dembow learns from a corpus of reggaeton MIDI and writes new tracks of its own.
It began life in 2016 as a Restricted Boltzmann Machine over a binary piano roll;
it is now a **decoder-only Transformer over an event-based music language** — the
same recipe behind modern symbolic-music models.

## How it works

Dembow treats music the way a language model treats text. Every song is
tokenized into a stream of musical **events** (REMI-style):

## Getting Started
```
BOS BAR POS_0 INST_drums DRUM_kick DUR_1 VEL_5
POS_0 INST_bass PITCH_36 DUR_4 VEL_6
POS_4 INST_drums DRUM_snare DUR_1 VEL_5 ...
BAR ... EOS
```

Each note carries its **instrument group** (drums / bass / mid / high), **pitch**,
**duration**, and **velocity** — so the model can write expressive,
multi-instrument arrangements, not a flat on/off grid. A small Transformer then
learns to predict the next event from everything before it, using masked
self-attention to capture phrasing, repetition, and the way the drums and bass
lock into the dembow groove.

Generation is autoregressive with **temperature + nucleus (top-p) sampling**, the
standard modern decoding strategy.

## What changed from the original

1. Install [Tensorflow](https://www.tensorflow.org/). If you have trouble running Tensorflow installation it may help:
| Then (2016) | Now |
| --- | --- |
| Python 2, TensorFlow 1.x | Python 3, **PyTorch** |
| Restricted Boltzmann Machine | **Decoder-only Transformer** |
| Binary piano roll (on/off only) | **Event tokens**: pitch + duration + velocity |
| All tracks flattened into one roll | **Multi-instrument** (drums / bass / mid / high) |
| No sense of time | **Self-attention** over the whole sequence |
| Trained on ~76 raw files | **Pitch-augmented** corpus (×7) for generalization |
| `python-midi` (Py2, dead) | [`mido`](https://mido.readthedocs.io) |
| Threw the weights away | Saves & loads checkpoints |
| One-shot script | A real CLI + installable package + CI |

## Getting started

```sh
sudo easy_install pip
sudo pip install --upgrade virtualenv
export PIP_REQUIRE_VIRTUALENV=false
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt # numpy, mido, torch
# or install the package + `dembow` command:
pip install -e .
```

## Make magic happen

A **pretrained model ships with the package**, so you can generate immediately —
no training required:

```sh
dembow generate # uses the bundled model -> generated/dembow_*.mid
dembow generate --render # also write .wav so you can actually hear it
```

2. Install [Anaconda and dependencies](https://www.continuum.io/downloads)
Train your own (better, especially with more data):

3. Create virtualenv
```sh
virtualenv venv
dembow train # -> dembow.pt
dembow generate # picks up your dembow.pt automatically
```

Without installing:

```sh
python -m dembow.cli train
python -m dembow.cli generate
```

4. Activate venv
Or light the fire (train + generate in one go):

```sh
source venv/bin/activate
python fire.py
```

5. Install python_midi module in normal procedure
## Hardware presets

Transformers are slow to train on CPU, so `train` picks a preset automatically
(small model + early stopping on CPU, a bigger one on GPU). Override it, or any
individual flag:

```sh
git clone git@github.com:vishnubob/python-midi.git
cd python-midi
python setup.py install
dembow train --preset cpu # small + fast (auto-selected when no GPU)
dembow train --preset gpu # bigger model, more epochs, more augmentation
dembow train --preset gpu --d-model 320 --n-layers 6 # flags override the preset
```

6. Install remaining dependencies with pip.
- matplotlib
- numpy
- pandas
- msgpack
- glob
- tqdm
## Training quality: validation + early stopping

Because the corpus is tiny, overfitting is the main risk. Training holds out a
fraction of **songs** (not windows — so pitch-augmented copies can't leak),
reports validation loss each epoch, **saves the checkpoint with the best val
loss**, and stops early when it plateaus:

```sh
pip install [dependencies]
dembow train --val-frac 0.1 --patience 8
# epoch 12/40 train 1.49 val 1.71 *best (saved)
# ...
# Early stopping at epoch 23 (no val improvement for 8 epochs)
```

7. Make magic happen. First train your model with custom parameters and then wait the output.
## Generation

```sh
python fire.py
dembow generate \
--num-samples 8 \
--max-new-tokens 1200 \ # longer songs
--temperature 0.9 \ # <1 tighter & more repetitive, >1 wilder
--top-p 0.92 \ # nucleus sampling threshold
--repetition-penalty 1.15 \ # discourage degenerate loops (1.0 = off)
--no-repeat-ngram 0 \ # hard-ban repeated token n-grams (0 = off)
--prime-bars 2 \ # real bars used to kick off each song
--seed-dir none # cold start instead of priming from a real song
```

`--repetition-penalty` gently down-weights recently used tokens so the model
doesn't get stuck looping — while still allowing the musical repetition that
makes a groove a groove.

### Hearing it (audio)

```sh
dembow generate --render # writes .wav next to each .mid
dembow generate --render --soundfont my.sf2 # better quality via FluidSynth
```

`--render` turns each generated song into audio. If [FluidSynth](https://www.fluidsynth.org/)
and a SoundFont are installed it uses them for realistic instruments; otherwise
it falls back to a small built-in synth so rendering works with no extra setup.

## Hear it without training

A few example outputs from a small demo model live in [`examples/`](examples/)
so you can listen before training your own.

**Honest note on quality.** The corpus is only ~76 short MIDI files, so even a
Transformer is data-limited — it captures the *feel* (groove, instrumentation,
key) more than polished, hook-worthy songwriting. The single biggest lever is
**more clean MIDI** in `reggaeton_samples/` (see
[`reggaeton_samples/SOURCES.md`](reggaeton_samples/SOURCES.md) for where to find
it). Pitch augmentation and priming from real songs help it stay in the pocket
meanwhile.

## Project layout

```
dembow/
tokenizer.py MIDI <-> event tokens (the REMI-style music language)
model.py the decoder-only Transformer
data.py corpus loading, pitch augmentation, song-level train/val split
train.py training loop, validation, early stopping, best-checkpoint
generate.py sample new songs (temperature / top-p / repetition control)
render.py MIDI -> audio (FluidSynth, or a builtin dependency-free synth)
cli.py the `dembow` command (with cpu/gpu presets)
assets/ a bundled pretrained model so generation works out of the box
fire.py one-shot entry point
reggaeton_samples/ the MIDI corpus (+ SOURCES.md: where to find more)
examples/ a few generated outputs from a small demo model
tests/ a fast end-to-end smoke test
```
Depends of the technical capabilities of your computer, it can take from 5 to 10 minutes.

## Contribute
We need your help feeding and training our current model. If you have reggeaton samples feel free to contribute.

We still need your help feeding the model. If you have reggaeton MIDI, drop it in
`reggaeton_samples/` and open a pull request — more data is the single best way
to make Dembow sound like a hit.
25 changes: 25 additions & 0 deletions dembow/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
"""Dembow -- a Transformer that generates reggaeton.

The first A.I. that generates reggaeton hits, rebuilt around a modern,
event-based music language model.

The 2016 original trained a Restricted Boltzmann Machine on a binary piano roll.
This version replaces that entirely: songs are tokenized into a REMI-style stream
of musical events (bar, position, instrument, pitch, duration, velocity) and a
decoder-only Transformer learns to generate them one token at a time -- the same
recipe used by modern symbolic-music models.
"""

from .tokenizer import VOCAB, Vocab, encode, decode
from .model import MusicTransformer, ModelConfig

__version__ = "2.2.0"

__all__ = [
"VOCAB",
"Vocab",
"encode",
"decode",
"MusicTransformer",
"ModelConfig",
]
Binary file added dembow/assets/dembow-pretrained.pt
Binary file not shown.
Loading
Loading