Skip to content

ggapp1/moodqueue

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MoodQueue

Natural language to playlist — a multi-modal RAG system that turns mood descriptions into curated playlists

"melancholic late night jazz with cigarette smoke energy" → 20 tracks with streaming links

ArchitectureHow It WorksQuick StartExample ResultsTech Stack


What is this?

MoodQueue converts expressive natural language queries into curated playlists by searching across four parallel vector indexes — artist text embeddings, graph-enhanced artist embeddings, audio feature vectors, and verse-level lyrics embeddings — then merging results with Reciprocal Rank Fusion and sequencing the final playlist with an LLM reranker.

It handles queries that traditional playlist generators can't:

Query What makes it hard What MoodQueue does
"melancholic late night jazz with cigarette smoke energy" No tag or genre captures "cigarette smoke energy" HyDE generates a synthetic artist bio + verse, embeds them, searches 4 indexes
"songs about getting rich, hustle, success" Thematic — needs lyric understanding Dual lyrics search (HyDE verse + raw query) finds "Big Rich Town", "Dirtee Cash"
"angry 90s grunge, like screaming into the void" Combines mood + era + genre + metaphor Genre-aware HyDE generates grunge-style lyrics, genre one-hot filters out non-rock
"more upbeat, add some electronic" Requires conversation memory LangGraph accumulates modifiers, resynthesizes query, excludes prior tracks

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                           User Query                                    │
│                "melancholic late night jazz"                             │
└────────────────────────────────┬────────────────────────────────────────┘
                                 │
                    ┌────────────▼────────────┐
                    │   Intent Classifier      │
                    │   (Gemini Flash 2.5)     │
                    │   DESCRIBE / SEED /       │
                    │   REFINE                  │
                    └────────────┬─────────────┘
                                 │
                    ┌────────────▼────────────┐
                    │   Query Builder           │
                    │   DESCRIBE → pass through │
                    │   SEED → prepend artist   │
                    │   REFINE → resynthesize   │
                    └────────────┬─────────────┘
                                 │
          ┌──────────────────────▼──────────────────────┐
          │              HyDE Generation                │
          │                                             │
          │  LLM generates:                             │
          │    • Artist bio (factual, genre-specific)   │
          │    • Song verse (genre-aware style)         │
          │    • Mood vector (11-dim)                   │
          │    • Genre labels (5-10, mapped to one-hot) │
          └──────────┬──────────────────────────────────┘
                     │
      ┌──────────────┼──────────────┬───────────────────┐
      │              │              │                    │
      ▼              ▼              ▼                    ▼
┌───────────┐ ┌───────────┐ ┌────────────┐   ┌──────────────────┐
│ Snowflake │ │ GAT proj  │ │ mood+genre │   │  mpnet (768d)    │
│ (1024d)   │ │ (128d)    │ │ (31d)      │   │  HyDE verse +    │
│           │ │           │ │            │   │  raw query       │
└─────┬─────┘ └─────┬─────┘ └─────┬──────┘   └────────┬─────────┘
      │              │              │                    │
      ▼              ▼              ▼                    ▼
┌─────────────────────────┐  ┌──────────┐   ┌──────────────────┐
│   Path A: Artist Funnel │  │ Path B   │   │  Path B: Lyrics  │
│                         │  │ Audio    │   │                  │
│  Artist FAISS (text+GAT)│  │ Song AB  │   │  Lyrics FAISS    │
│  → top 50 artists       │  │ FAISS    │   │  (dual search)   │
│  → mood-ranked songs    │  │ → top 100│   │  → ~200 songs    │
│  → 100 candidates       │  │          │   │                  │
└────────────┬────────────┘  └────┬─────┘   └────────┬─────────┘
             │                    │                    │
             └────────┬───────────┘                    │
                      │         ┌──────────────────────┘
                      ▼         ▼
              ┌───────────────────────┐
              │  Weighted RRF Merge   │
              │  lyrics=0.50          │
              │  artist=0.25          │
              │  audio=0.25           │
              │  ~400 → top 50        │
              └───────────┬───────────┘
                          │
              ┌───────────▼───────────┐
              │  Two-Stage Reranker   │
              │                       │
              │  1. Rule filter:      │
              │     max 2/artist,     │
              │     dedup titles      │
              │  2. LLM sequences     │
              │     for playlist flow │
              └───────────┬───────────┘
                          │
              ┌───────────▼───────────┐
              │  URI Resolution       │
              │  ISRC → Odesli API    │
              │  Fallback: iTunes     │
              │  → Spotify / Apple /  │
              │    YouTube links      │
              └───────────────────────┘

How It Works

1. Data Pipeline

Transforms 90GB of raw music data into a DuckDB warehouse and four FAISS vector indexes:

Source What we extract Volume
MusicBrainz Artists, recordings, tags, relationships, ISRCs 2.8M artists, 38M recordings
Discogs Genre taxonomy (16 genres, 757 styles) 79% artist coverage
AcousticBrainz Per-recording mood/audio classifiers (14 features) 6.8M recordings
Kaggle/Genius Verse-chunked lyrics 5.9M songs
Last.fm Artist similarity graph, tags, play counts 247K edges (dev)

2. Four FAISS Indexes

Index Dim Model What it captures
Artist text 1024 Snowflake Arctic Embed L v2 "Who is this artist?" — tags, genres, styles, recording tag distributions
Artist GAT 128 Graph Attention Network Same + neighborhood signal from similarity graph. Sparse artists inherit from neighbors.
Song AB 31 AcousticBrainz + genre one-hot "What does this song sound like?" — 11 mood dims + 20 genre categories
Lyrics 768 all-mpnet-base-v2 "What is this song about?" — verse-level semantic matching

3. HyDE (Hypothetical Document Embeddings)

User queries ("cigarette smoke energy") and indexed content ("Tags: jazz, cool jazz. Mood: sad=0.7") live in different distributions. HyDE bridges this by generating synthetic documents that would exist in the index:

Query:    "songs about getting rich, hustle"
HyDE bio: "A trap and hip-hop artist known for aggressive, boastful
           lyrics over heavy 808s. Themes of wealth accumulation,
           street life, and ambition."
HyDE verse: "Woke up this morning, paper on my mind
             Gotta chase this money, leave the struggle behind
             New whip, new chain, everything on shine"
Genres:   hip hop, trap, gangsta rap, rap, boom bap, dirty south

The bio is embedded with Snowflake Arctic for artist search. The verse is embedded with mpnet for lyrics search. The raw query is also embedded for lyrics search (dual search — catches what HyDE misses).

4. Multi-Turn Conversation

Built with LangGraph. Five intents route queries differently:

Intent Example What happens
DESCRIBE "melancholic jazz" Standard HyDE → search pipeline
SEED_ARTIST "artists like Radiohead" Prepend seed, HyDE generates in that neighborhood
SEED_HYBRID "like Radiohead but darker" Seed + modifier combined
REFINE "more upbeat" LLM resynthesizes full history into new query

Each turn excludes previously returned tracks. Modifiers accumulate across turns.


Quick Start

Prerequisites

  • Python 3.11+
  • uv package manager
  • Data dumps in data-pipeline/sources/ (MusicBrainz, Discogs, AcousticBrainz, Genius lyrics)
  • API keys in .env: LASTFM_API_KEY, LASTFM_API_SECRET, OPENROUTER_API_KEY

Build the data pipeline

cd data-pipeline
uv sync --extra index

# Build dev artist list (1000 artists: 200 from Last.fm + 800 genre-diverse)
uv run python scripts/build_dev_list.py

# Run ETL, crawl, and index build
uv run moodqueue-pipeline etl        # Parse dumps → Parquet → DuckDB
uv run moodqueue-pipeline crawl      # Last.fm similarity + tags + tracks
uv run moodqueue-pipeline index      # Build 4 FAISS indexes

Run a single query

cd retrieval
uv sync
uv run moodqueue-retrieval query "melancholic late night jazz" --no-uri
uv run moodqueue-retrieval query "songs about getting rich" --debug

Start the conversation chat

cd conversation
uv sync
uv run moodqueue-chat

Launch the demo UI

cd demo
uv sync
uv run python app.py
# Opens browser at http://localhost:7860

Example Results

"angry 90s grunge rock, like screaming into the void"

# Artist Song Found by
1 Nirvana Aneurysm 🎤 artist + 🎵 audio
2 The Smashing Pumpkins Cinnamon Girl 🎤 artist
3 Linkin Park And One 📝 lyrics
4 Dark Tranquillity Static 📝 lyrics
5 Opeth Wreath 📝 lyrics
6 Agalloch Fire Above Ice Below 📝 lyrics

"hard hitting east coast hip hop, boom bap beats"

# Artist Song Found by
1 Public Enemy Fight the Power 🎤 artist + 🎵 audio
2 Pete Rock Back on da Block 🎤 artist + 🎵 audio
3 2Pac Got My Mind Made Up 📝 lyrics
4 Snoop Dogg Nuthin' but a "G" Thang 🎤 artist + 🎵 audio
5 Kendrick Lamar Kurupted 📝 lyrics

"songs about moving cities, like Disorder by Joy Division"

# Artist Song Found by
1 The Smiths The Hand That Rocks the Cradle 🎤 artist
2 CHVRCHES Really Gone 🎤 artist + 🎵 audio
3 Killing Joke Goodbye to the Village 📝 lyrics
4 twenty one pilots Redecorate 📝 lyrics
5 Simon & Garfunkel Homeward Bound 📝 lyrics

Tech Stack

Layer Technology Why
Data pipeline Prefect, DuckDB, Parquet Orchestrated ETL with columnar storage. Zero-config embedded database.
Artist embeddings Snowflake Arctic Embed L v2 (1024d) Top-tier retrieval quality. +9 MTEB points over mpnet.
Lyrics embeddings all-mpnet-base-v2 (768d) Fast, MPS-friendly for 400K+ chunks.
Graph learning GAT (PyTorch Geometric) 2-layer, 4-head attention. Propagates signal through Last.fm similarity graph.
Vector search FAISS (IndexFlatIP) Exact cosine similarity. Fast at current scale.
Query understanding HyDE + Gemini Flash 2.5 Hypothetical document generation bridges query↔index distribution gap.
Merge strategy Weighted Reciprocal Rank Fusion Score-agnostic multi-source merge. Lyrics path weighted 2x.
Reranking Two-stage (rules + LLM) Prefilter enforces diversity, LLM sequences for playlist flow.
Conversation LangGraph Typed state machine with intent routing and stateless refinement.
Demo UI Gradio Chat interface with pipeline visualization.
URI resolution Odesli + iTunes ISRC → streaming links at query time.
Package management uv Fast, deterministic Python dependency resolution.

Project Structure

moodqueue/
├── data-pipeline/           # Phase 1: ETL + indexes
│   ├── src/moodqueue_pipeline/
│   │   ├── tasks/           # MB, Discogs, AB, lyrics, index build
│   │   ├── flows/           # Prefect flows (ETL, crawl, index)
│   │   └── cli.py           # moodqueue-pipeline CLI
│   ├── scripts/
│   │   └── build_dev_list.py
│   ├── output/
│   │   ├── db/              # moodqueue.duckdb
│   │   └── indexes/         # 4 FAISS indexes + embeddings
│   └── data-pipeline.md     # Architecture doc
│
├── retrieval/               # Phase 2: Search engine
│   ├── src/moodqueue_retrieval/
│   │   ├── search/          # path_a, path_b, rrf, exclusion
│   │   ├── hyde.py          # HyDE generation + genre mapping
│   │   ├── embed.py         # Dual model embedding
│   │   ├── reranker.py      # Two-stage reranker
│   │   ├── pipeline.py      # retrieve() orchestrator
│   │   └── cli.py           # moodqueue-retrieval CLI
│   └── retrieval-engine.md  # Architecture doc
│
├── conversation/            # Phase 3: Multi-turn chat
│   ├── src/moodqueue_conversation/
│   │   ├── nodes/           # intent, query_builder, retriever, state_updater
│   │   ├── graph.py         # LangGraph definition
│   │   └── cli.py           # moodqueue-chat REPL
│   └── conversation.md      # Architecture doc
│
├── demo/                    # Phase 4: Gradio UI
│   └── app.py               # Chat interface + pipeline details
│
└── docs/
    ├── moodqueue-architecture-unified.md
    └── TASKS.md

Performance

Dev mode (1000 artists, 1.5M recordings, 400K lyrics chunks):

Metric Value
Cold start (load 2 models + 4 indexes) ~12s
Query latency (warm, no URI) ~10-14s
Query latency (warm, with URI) ~14-18s
LLM cost per turn ~$0.0005
Memory footprint ~3.2GB

Cost

All LLM calls use Gemini Flash 2.5 via OpenRouter:

Per turn Cost
Intent classifier ~$0.00005
Query builder (REFINE) ~$0.0001
HyDE generation ~$0.0002
Reranker ~$0.0002
Total ~$0.0005

A 5-turn conversation costs ~$0.0025. At scale: 1M conversations (5 turns each) ≈ $2,500.


License

MIT


Built by @ggapp1

About

text to playlist pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors