Natural language to playlist — a multi-modal RAG system that turns mood descriptions into curated playlists
"melancholic late night jazz with cigarette smoke energy" → 20 tracks with streaming links
Architecture • How It Works • Quick Start • Example Results • Tech Stack
MoodQueue converts expressive natural language queries into curated playlists by searching across four parallel vector indexes — artist text embeddings, graph-enhanced artist embeddings, audio feature vectors, and verse-level lyrics embeddings — then merging results with Reciprocal Rank Fusion and sequencing the final playlist with an LLM reranker.
It handles queries that traditional playlist generators can't:
| Query | What makes it hard | What MoodQueue does |
|---|---|---|
| "melancholic late night jazz with cigarette smoke energy" | No tag or genre captures "cigarette smoke energy" | HyDE generates a synthetic artist bio + verse, embeds them, searches 4 indexes |
| "songs about getting rich, hustle, success" | Thematic — needs lyric understanding | Dual lyrics search (HyDE verse + raw query) finds "Big Rich Town", "Dirtee Cash" |
| "angry 90s grunge, like screaming into the void" | Combines mood + era + genre + metaphor | Genre-aware HyDE generates grunge-style lyrics, genre one-hot filters out non-rock |
| "more upbeat, add some electronic" | Requires conversation memory | LangGraph accumulates modifiers, resynthesizes query, excludes prior tracks |
┌─────────────────────────────────────────────────────────────────────────┐
│ User Query │
│ "melancholic late night jazz" │
└────────────────────────────────┬────────────────────────────────────────┘
│
┌────────────▼────────────┐
│ Intent Classifier │
│ (Gemini Flash 2.5) │
│ DESCRIBE / SEED / │
│ REFINE │
└────────────┬─────────────┘
│
┌────────────▼────────────┐
│ Query Builder │
│ DESCRIBE → pass through │
│ SEED → prepend artist │
│ REFINE → resynthesize │
└────────────┬─────────────┘
│
┌──────────────────────▼──────────────────────┐
│ HyDE Generation │
│ │
│ LLM generates: │
│ • Artist bio (factual, genre-specific) │
│ • Song verse (genre-aware style) │
│ • Mood vector (11-dim) │
│ • Genre labels (5-10, mapped to one-hot) │
└──────────┬──────────────────────────────────┘
│
┌──────────────┼──────────────┬───────────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌────────────┐ ┌──────────────────┐
│ Snowflake │ │ GAT proj │ │ mood+genre │ │ mpnet (768d) │
│ (1024d) │ │ (128d) │ │ (31d) │ │ HyDE verse + │
│ │ │ │ │ │ │ raw query │
└─────┬─────┘ └─────┬─────┘ └─────┬──────┘ └────────┬─────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────┐ ┌──────────┐ ┌──────────────────┐
│ Path A: Artist Funnel │ │ Path B │ │ Path B: Lyrics │
│ │ │ Audio │ │ │
│ Artist FAISS (text+GAT)│ │ Song AB │ │ Lyrics FAISS │
│ → top 50 artists │ │ FAISS │ │ (dual search) │
│ → mood-ranked songs │ │ → top 100│ │ → ~200 songs │
│ → 100 candidates │ │ │ │ │
└────────────┬────────────┘ └────┬─────┘ └────────┬─────────┘
│ │ │
└────────┬───────────┘ │
│ ┌──────────────────────┘
▼ ▼
┌───────────────────────┐
│ Weighted RRF Merge │
│ lyrics=0.50 │
│ artist=0.25 │
│ audio=0.25 │
│ ~400 → top 50 │
└───────────┬───────────┘
│
┌───────────▼───────────┐
│ Two-Stage Reranker │
│ │
│ 1. Rule filter: │
│ max 2/artist, │
│ dedup titles │
│ 2. LLM sequences │
│ for playlist flow │
└───────────┬───────────┘
│
┌───────────▼───────────┐
│ URI Resolution │
│ ISRC → Odesli API │
│ Fallback: iTunes │
│ → Spotify / Apple / │
│ YouTube links │
└───────────────────────┘
Transforms 90GB of raw music data into a DuckDB warehouse and four FAISS vector indexes:
| Source | What we extract | Volume |
|---|---|---|
| MusicBrainz | Artists, recordings, tags, relationships, ISRCs | 2.8M artists, 38M recordings |
| Discogs | Genre taxonomy (16 genres, 757 styles) | 79% artist coverage |
| AcousticBrainz | Per-recording mood/audio classifiers (14 features) | 6.8M recordings |
| Kaggle/Genius | Verse-chunked lyrics | 5.9M songs |
| Last.fm | Artist similarity graph, tags, play counts | 247K edges (dev) |
| Index | Dim | Model | What it captures |
|---|---|---|---|
| Artist text | 1024 | Snowflake Arctic Embed L v2 | "Who is this artist?" — tags, genres, styles, recording tag distributions |
| Artist GAT | 128 | Graph Attention Network | Same + neighborhood signal from similarity graph. Sparse artists inherit from neighbors. |
| Song AB | 31 | AcousticBrainz + genre one-hot | "What does this song sound like?" — 11 mood dims + 20 genre categories |
| Lyrics | 768 | all-mpnet-base-v2 | "What is this song about?" — verse-level semantic matching |
User queries ("cigarette smoke energy") and indexed content ("Tags: jazz, cool jazz. Mood: sad=0.7") live in different distributions. HyDE bridges this by generating synthetic documents that would exist in the index:
Query: "songs about getting rich, hustle"
HyDE bio: "A trap and hip-hop artist known for aggressive, boastful
lyrics over heavy 808s. Themes of wealth accumulation,
street life, and ambition."
HyDE verse: "Woke up this morning, paper on my mind
Gotta chase this money, leave the struggle behind
New whip, new chain, everything on shine"
Genres: hip hop, trap, gangsta rap, rap, boom bap, dirty south
The bio is embedded with Snowflake Arctic for artist search. The verse is embedded with mpnet for lyrics search. The raw query is also embedded for lyrics search (dual search — catches what HyDE misses).
Built with LangGraph. Five intents route queries differently:
| Intent | Example | What happens |
|---|---|---|
| DESCRIBE | "melancholic jazz" | Standard HyDE → search pipeline |
| SEED_ARTIST | "artists like Radiohead" | Prepend seed, HyDE generates in that neighborhood |
| SEED_HYBRID | "like Radiohead but darker" | Seed + modifier combined |
| REFINE | "more upbeat" | LLM resynthesizes full history into new query |
Each turn excludes previously returned tracks. Modifiers accumulate across turns.
- Python 3.11+
- uv package manager
- Data dumps in
data-pipeline/sources/(MusicBrainz, Discogs, AcousticBrainz, Genius lyrics) - API keys in
.env:LASTFM_API_KEY,LASTFM_API_SECRET,OPENROUTER_API_KEY
cd data-pipeline
uv sync --extra index
# Build dev artist list (1000 artists: 200 from Last.fm + 800 genre-diverse)
uv run python scripts/build_dev_list.py
# Run ETL, crawl, and index build
uv run moodqueue-pipeline etl # Parse dumps → Parquet → DuckDB
uv run moodqueue-pipeline crawl # Last.fm similarity + tags + tracks
uv run moodqueue-pipeline index # Build 4 FAISS indexescd retrieval
uv sync
uv run moodqueue-retrieval query "melancholic late night jazz" --no-uri
uv run moodqueue-retrieval query "songs about getting rich" --debugcd conversation
uv sync
uv run moodqueue-chatcd demo
uv sync
uv run python app.py
# Opens browser at http://localhost:7860| # | Artist | Song | Found by |
|---|---|---|---|
| 1 | Nirvana | Aneurysm | 🎤 artist + 🎵 audio |
| 2 | The Smashing Pumpkins | Cinnamon Girl | 🎤 artist |
| 3 | Linkin Park | And One | 📝 lyrics |
| 4 | Dark Tranquillity | Static | 📝 lyrics |
| 5 | Opeth | Wreath | 📝 lyrics |
| 6 | Agalloch | Fire Above Ice Below | 📝 lyrics |
| # | Artist | Song | Found by |
|---|---|---|---|
| 1 | Public Enemy | Fight the Power | 🎤 artist + 🎵 audio |
| 2 | Pete Rock | Back on da Block | 🎤 artist + 🎵 audio |
| 3 | 2Pac | Got My Mind Made Up | 📝 lyrics |
| 4 | Snoop Dogg | Nuthin' but a "G" Thang | 🎤 artist + 🎵 audio |
| 5 | Kendrick Lamar | Kurupted | 📝 lyrics |
| # | Artist | Song | Found by |
|---|---|---|---|
| 1 | The Smiths | The Hand That Rocks the Cradle | 🎤 artist |
| 2 | CHVRCHES | Really Gone | 🎤 artist + 🎵 audio |
| 3 | Killing Joke | Goodbye to the Village | 📝 lyrics |
| 4 | twenty one pilots | Redecorate | 📝 lyrics |
| 5 | Simon & Garfunkel | Homeward Bound | 📝 lyrics |
| Layer | Technology | Why |
|---|---|---|
| Data pipeline | Prefect, DuckDB, Parquet | Orchestrated ETL with columnar storage. Zero-config embedded database. |
| Artist embeddings | Snowflake Arctic Embed L v2 (1024d) | Top-tier retrieval quality. +9 MTEB points over mpnet. |
| Lyrics embeddings | all-mpnet-base-v2 (768d) | Fast, MPS-friendly for 400K+ chunks. |
| Graph learning | GAT (PyTorch Geometric) | 2-layer, 4-head attention. Propagates signal through Last.fm similarity graph. |
| Vector search | FAISS (IndexFlatIP) | Exact cosine similarity. Fast at current scale. |
| Query understanding | HyDE + Gemini Flash 2.5 | Hypothetical document generation bridges query↔index distribution gap. |
| Merge strategy | Weighted Reciprocal Rank Fusion | Score-agnostic multi-source merge. Lyrics path weighted 2x. |
| Reranking | Two-stage (rules + LLM) | Prefilter enforces diversity, LLM sequences for playlist flow. |
| Conversation | LangGraph | Typed state machine with intent routing and stateless refinement. |
| Demo UI | Gradio | Chat interface with pipeline visualization. |
| URI resolution | Odesli + iTunes | ISRC → streaming links at query time. |
| Package management | uv | Fast, deterministic Python dependency resolution. |
moodqueue/
├── data-pipeline/ # Phase 1: ETL + indexes
│ ├── src/moodqueue_pipeline/
│ │ ├── tasks/ # MB, Discogs, AB, lyrics, index build
│ │ ├── flows/ # Prefect flows (ETL, crawl, index)
│ │ └── cli.py # moodqueue-pipeline CLI
│ ├── scripts/
│ │ └── build_dev_list.py
│ ├── output/
│ │ ├── db/ # moodqueue.duckdb
│ │ └── indexes/ # 4 FAISS indexes + embeddings
│ └── data-pipeline.md # Architecture doc
│
├── retrieval/ # Phase 2: Search engine
│ ├── src/moodqueue_retrieval/
│ │ ├── search/ # path_a, path_b, rrf, exclusion
│ │ ├── hyde.py # HyDE generation + genre mapping
│ │ ├── embed.py # Dual model embedding
│ │ ├── reranker.py # Two-stage reranker
│ │ ├── pipeline.py # retrieve() orchestrator
│ │ └── cli.py # moodqueue-retrieval CLI
│ └── retrieval-engine.md # Architecture doc
│
├── conversation/ # Phase 3: Multi-turn chat
│ ├── src/moodqueue_conversation/
│ │ ├── nodes/ # intent, query_builder, retriever, state_updater
│ │ ├── graph.py # LangGraph definition
│ │ └── cli.py # moodqueue-chat REPL
│ └── conversation.md # Architecture doc
│
├── demo/ # Phase 4: Gradio UI
│ └── app.py # Chat interface + pipeline details
│
└── docs/
├── moodqueue-architecture-unified.md
└── TASKS.md
Dev mode (1000 artists, 1.5M recordings, 400K lyrics chunks):
| Metric | Value |
|---|---|
| Cold start (load 2 models + 4 indexes) | ~12s |
| Query latency (warm, no URI) | ~10-14s |
| Query latency (warm, with URI) | ~14-18s |
| LLM cost per turn | ~$0.0005 |
| Memory footprint | ~3.2GB |
All LLM calls use Gemini Flash 2.5 via OpenRouter:
| Per turn | Cost |
|---|---|
| Intent classifier | ~$0.00005 |
| Query builder (REFINE) | ~$0.0001 |
| HyDE generation | ~$0.0002 |
| Reranker | ~$0.0002 |
| Total | ~$0.0005 |
A 5-turn conversation costs ~$0.0025. At scale: 1M conversations (5 turns each) ≈ $2,500.
MIT
Built by @ggapp1