AI-powered album cover recognition system using PyTorch, FastAPI, and Next.js.
- 🎵 Album cover recognition using deep learning
- 🖼️ Drag & drop image upload interface
- 📊 Confidence scores for top 5 predictions
- 🎨 Modern, responsive UI with dark mode support
- 🚀 FastAPI backend with PyTorch model serving
- 📦 Transfer learning with ResNet50
discogs-sage-app/
├── backend/ # FastAPI + PyTorch (works locally and in SageMaker)
│ ├── data/ # Data pipeline: XML parser, image downloader
│ ├── ml/ # Training & inference (shared)
│ ├── scripts/ # CLI: build_data, train
│ ├── main.py # FastAPI service
│ ├── train.py # SageMaker training entry point
│ └── inference.py # SageMaker inference entry point
├── frontend/ # Next.js frontend
├── infrastructure/ # CDK: API Gateway, Lambda, SageMaker
├── docs/ # Documentation
└── data/ # Manifest, images (generated by build_data)
This project requires Python 3.10+ (for PyTorch). Using pyenv lets you install and switch Python versions easily.
Install pyenv
-
macOS (Homebrew):
brew install pyenv
Add to
~/.zshrc(or~/.bash_profile):export PYENV_ROOT="$HOME/.pyenv" [[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH" eval "$(pyenv init -)"
Then run
exec "$SHELL"or open a new terminal. -
Linux (Ubuntu/Debian):
curl https://pyenv.run | bashAdd the same three lines above to
~/.bashrc, thenexec "$SHELL".
Install and use Python 3.11
pyenv install 3.11.9
pyenv local 3.11.9 # Use 3.11 for this projectVerify: python --version should show 3.11.9.
- Navigate to backend directory:
cd backend- Create virtual environment (uses Python from pyenv if set):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtIf you get "No matching distribution found for torch", install PyTorch first:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txtOr run ./setup.sh which installs PyTorch before other deps.
- Build data and train (see docs/backend.md):
python backend/scripts/build_data.py --count 500
cd backend && python -m scripts.train --data-dir ../data --model-dir models- Start the backend:
python main.pyThe API will be available at http://localhost:8000
- Navigate to frontend directory:
cd frontend- Install dependencies:
npm install- Start development server:
npm run devThe app will be available at http://localhost:3000
First, download album cover images from Discogs:
curl -X POST http://localhost:8000/api/download-imagesThis will download images for the first 50 releases from your manifest file.
Train the classification model:
curl -X POST http://localhost:8000/api/trainThis trains a ResNet50 model using transfer learning on the downloaded images. Training takes a few minutes.
- Open
http://localhost:3000in your browser - Upload or drag & drop an album cover image
- Click "Identify Album"
- View the top 5 predictions with confidence scores
GET /- API statusGET /api/releases- List all releases in datasetGET /api/health- Health checkPOST /api/download-images- Download images from DiscogsPOST /api/train- Train classification modelPOST /api/predict- Predict album from uploaded image
- FastAPI: Modern Python web framework
- PyTorch: Deep learning framework
- ResNet50: Pre-trained CNN for image classification
- Transfer Learning: Fine-tuned on album covers
- Next.js 15: React framework with App Router
- TypeScript: Type-safe development
- Tailwind CSS: Utility-first styling
- Modern UI: Drag & drop, dark mode, responsive
- Base Model: ResNet50 pre-trained on ImageNet
- Architecture: Transfer learning with frozen early layers
- Input: 224x224 RGB images
- Output: Softmax probabilities over 50 album classes
- Training: Data augmentation with horizontal flips and color jitter
- Optimizer: Adam with learning rate scheduling
DISCOGS_USER_TOKEN: (Preferred) Personal access token from Discogs Developers → Generate new token. Uses python3-discogs-client.DISCOGS_CONSUMER_KEY+DISCOGS_CONSUMER_SECRET: Alternative auth from Create an applicationMODEL_PATH: Path to trained model (default:./models/album_classifier.pth)IMAGES_PATH: Path to album images (default:./data/images)BUCKET_NAME: S3 bucket for SageMakerSAGEMAKER_ROLE: IAM role ARN for SageMaker
NEXT_PUBLIC_API_URL: Backend API URL (default:http://localhost:8000)
NumPy "compiled with NumPy 1.x" error – Downgrade: pip install "numpy<2"
SSL certificate verify failed (when downloading ResNet50 weights on macOS):
# Option 1: Run Python's certificate installer (if using python.org installer)
/Applications/Python\ 3.11/Install\ Certificates.command
# Option 2: Use certifi (if using pyenv)
export SSL_CERT_FILE=$(python -c "import certifi; print(certifi.where())")
# Option 3: Install certs for pyenv Python
pip install certifi
python -m certifi
# Then: export SSL_CERT_FILE=/path/from/certifi/outputNo matching distribution for torch – See step 3 in Backend setup above.
DNS/network error when downloading ResNet50 (nodename nor servname provided, or not known):
- On a machine with internet, run:
python backend/scripts/download_resnet50_weights.py - Copy the file to your offline machine at
~/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth - Or download from https://download.pytorch.org/models/resnet50-0676ba61.pth and set:
export RESNET50_WEIGHTS_PATH=/path/to/resnet50-0676ba61.pth
cd backend
python main.pycd frontend
npm run dev# Test prediction with curl
curl -X POST -F "file=@album_cover.jpg" http://localhost:8000/api/predictThis project includes full AWS SageMaker support for production deployment.
# 1. Upload data and code to S3
./prepare_for_studio.sh your-bucket-name
# 2. Train on SageMaker (from Studio notebook or CLI)
# See docs/SAGEMAKER_README.md
# 3. Deploy API Gateway + Lambda
cd infrastructure && npm run deploy -- --context endpointName=album-classifier- How SageMaker Works - Architecture, flow, key files
- SageMaker Quick Start - 5-step reference
- Complete Setup - Full walkthrough
- Deploy Inference - API Gateway + Lambda
- Expand to full Discogs catalog
- Add batch prediction support
- Implement image similarity search
- Add user feedback for model improvement
- Deploy to AWS SageMaker for production
- Add caching for faster predictions
- Support for multi-image input
MIT