Skip to content

RobinMillford/retail-forecast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ›’ Real-Time Retail Forecasting with RAG-Powered AI

Streamlit App Python XGBoost Pinecone Groq

Live Demo | Architecture Diagram


๐Ÿ“– Overview

A production-grade MLOps system combining traditional ML with RAG for retail demand forecasting. Features real-time data streaming, automated model retraining, and AI-powered data analysis over 3M+ records.

Key Capabilities:

  • ๐Ÿ”„ Live data ingestion (10-min intervals)
  • ๐Ÿค– RAG-powered Q&A over 3M+ sales records
  • ๐Ÿ“Š Dual forecasting (XGBoost + Prophet)
  • ๐ŸŽ›๏ธ What-if scenario analysis
  • ๐ŸŽจ Premium glassmorphism UI
  • โšก Zero-cost serverless infrastructure

๐Ÿ—๏ธ Architecture

Architecture


๐Ÿ”„ How Everything Works

1. Data Ingestion (Every 10 Minutes)

Kaggle (train.csv) โ†’ GitHub Action โ†’ producer_batch.py โ†’ Redis Stream โ†’ feature_store_batch.py โ†’ Upstash Redis
  • Downloads 3M+ records from Kaggle
  • Simulates 50 random transactions with current timestamps
  • Pushes to Redis Stream
  • Aggregates into daily/weekly/monthly features
  • Stores in Redis for dashboard

2. Model Training (Nightly)

Historical Data + Redis Buffer โ†’ train.py โ†’ XGBoost + Prophet โ†’ MLflow โ†’ Save Models โ†’ Git Commit โ†’ Auto-Deploy
  • Merges Kaggle data with live Redis buffer
  • Trains XGBoost on 12 features (oil, transactions, store metadata, holidays)
  • Trains Prophet for long-term trends
  • Saves best_model_v2.json, long_term_forecast.pkl, encoders
  • Commits to repo โ†’ Streamlit Cloud auto-deploys

3. Dashboard Predictions

User Input โ†’ Load Models โ†’ Encode Features โ†’ Fetch Redis Data โ†’ XGBoost.predict() โ†’ Display Chart
  • User selects store/product/date
  • Loads XGBoost model and encoders
  • Fetches live oil price and transactions from Redis
  • Runs prediction
  • Shows 7-day forecast

4. What-If Analysis

User Adjusts (Oil/Promo/Holiday) โ†’ Modify Features โ†’ XGBoost.predict() โ†’ Compare Baseline vs Scenario โ†’ Show Impact
  • User tweaks scenario parameters
  • Creates two feature sets (baseline vs scenario)
  • Runs predictions for both
  • Displays side-by-side comparison

5. Vector DB Build (Automated)

train.csv โ†’ Load 500K Recent Records โ†’ Embeddings โ†’ Pinecone (Cloud) โ†’ Daily Updates
  • Loads 500K most recent records
  • Generates text: "Date: 2017-12-25, Store: 5, Product: GROCERY, Sales: $1234"
  • Creates 384-dim embeddings (Sentence Transformers)
  • Uploads to Pinecone via API
  • Daily workflow adds new records automatically

6. AI Data Analyst (RAG)

Question โ†’ Parse Filters โ†’ Generate Embedding โ†’ Pinecone Search โ†’ Retrieve Top-20 โ†’ Groq API โ†’ Answer
  • User asks: "What were GROCERY sales in store 25?"
  • Extracts filters: {store_nbr: 25, family: GROCERY}
  • Searches 500K+ vectors using semantic similarity
  • Retrieves top 20 matching records from Pinecone
  • Sends to Groq (Llama 3.3 70B) with context
  • Generates answer with citations

7. App Loading (First Run)

User Visits โ†’ Connect Pinecone โ†’ Load Models โ†’ Connect Redis โ†’ Ready!
  • Connects to Pinecone (cloud-hosted)
  • No download needed (instant access)
  • Loads ML models from repo
  • Connects to Redis for live data
  • App ready to serve in seconds

๐Ÿ› ๏ธ Tech Stack

Category Technologies
Data Kaggle API, Redis Streams, Upstash Redis
ML XGBoost, Prophet, Sentence Transformers
AI Groq (Llama 3.3 70B), Pinecone, Sentence Transformers
MLOps GitHub Actions, MLflow, Streamlit Cloud

๐ŸŒŸ Features

1. Real-Time Dashboard

  • Live sales metrics from Redis
  • 7-day XGBoost + 30-day Prophet forecasts
  • Interactive Plotly charts

2. What-If Analysis

  • Simulate oil price changes ($40-$120)
  • Toggle promotions and holidays
  • Instant prediction updates

3. RAG-Powered AI Analyst

  • Natural language queries over 500K+ vectors
  • Cloud-hosted semantic search via Pinecone
  • Sub-2s responses via Groq API

Example Questions:

"What were total GROCERY sales in store 25?"
"Show sales trends for December 2017"
"Which stores had highest sales last week?"

๐Ÿš€ Quick Start

1. Clone & Install

git clone https://github.com/RobinMillford/retail-forecast.git
cd retail-forecast
pip install -r requirements.txt

2. Configure .env

# Required
UPSTASH_REDIS_REST_URL=your_redis_url
UPSTASH_REDIS_REST_TOKEN=your_redis_token
GROQ_API_KEY=your_groq_key

# For Vector DB (Pinecone)
PINECONE_API_KEY=your_pinecone_key
PINECONE_ENVIRONMENT=us-east-1-aws
PINECONE_INDEX_NAME=retail-sales

# Optional
KAGGLE_USERNAME=your_username
KAGGLE_KEY=your_api_key

3. Run

streamlit run dashboard.py

4. Upload Data to Pinecone (Optional)

python scripts/pinecone_initial_load.py

๐Ÿ“‚ Project Structure

retail_mlops/
โ”œโ”€โ”€ .github/workflows/       # 3 automated pipelines
โ”œโ”€โ”€ pages/                   # What-If + AI Analyst
โ”œโ”€โ”€ scripts/                 # Vector DB builders
โ”œโ”€โ”€ utils/                   # Shared modules
โ”œโ”€โ”€ dashboard.py             # Main app
โ”œโ”€โ”€ train.py                 # Model training
โ””โ”€โ”€ *.joblib, *.json, *.pkl  # Model artifacts

๐Ÿ”ง API Setup

Groq (Free)

  1. Get key: https://console.groq.com/
  2. Add to .env: GROQ_API_KEY=gsk_...

Pinecone

  1. Sign up: https://www.pinecone.io/
  2. Create index: retail-sales (384 dimensions, cosine)
  3. Get API key from dashboard
  4. Add to .env:
    PINECONE_API_KEY=your-key
    PINECONE_ENVIRONMENT=us-east-1-aws
    PINECONE_INDEX_NAME=retail-sales
    

๐ŸŽฏ Performance

  • Vector DB: 500K+ vectors, 384-dim embeddings (Pinecone)
  • Query Latency: <2s (search + LLM)
  • Model Accuracy: RMSE ~500
  • Uptime: 99.9% (Streamlit Cloud + Pinecone)

๐Ÿ”ฎ Roadmap

  • FastAPI deployment
  • LSTM/Transformer models
  • Real-time alerts
  • A/B testing framework

๐Ÿ‘ค Author

Yamin Hossain | GitHub


๐Ÿ™ Credits

Kaggle โ€ข Groq โ€ข Pinecone โ€ข Streamlit โ€ข Upstash

โญ Star this repo if you find it helpful!

About

This project is a production-grade MLOps pipeline designed to forecast retail sales in near-real-time.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages