Skip to content

davinaics/retail-customer-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

🛒 Retail Customer Analytics Dashboard

Customer Segmentation & Purchase Pattern Analysis

📌 Project Overview

This project analyzes retail transaction data to:

  • Segment customers based on their purchasing behavior using RFM Analysis and K-Means Clustering
  • Visualize customer clusters using PCA (Principal Component Analysis)
  • Discover product purchase patterns using Market Basket Analysis (Apriori Algorithm)
  • Provide an interactive analytics dashboard using Streamlit

The goal is to support data-driven marketing and business strategy decisions.


🎯 Business Objectives

  • Identify different customer segments (e.g., loyal, inactive, high spenders)
  • Support marketing strategies such as:
    • Loyalty programs
    • Personalized promotions
    • Product bundling
  • Understand which products are frequently purchased together

📂 Dataset

Online Retail Dataset
Transaction data with the following columns:

  • InvoiceNo
  • StockCode
  • Description
  • Quantity
  • InvoiceDate
  • UnitPrice
  • CustomerID
  • Country

Source: https://www.kaggle.com/datasets/ulrikthygepedersen/online-retail-dataset


🧹 Data Cleaning

The following preprocessing steps were applied:

  • Removed duplicate records
  • Removed rows with missing CustomerID and Description
  • Converted InvoiceDate to datetime format
  • Removed cancelled transactions (InvoiceNo starting with "C")
  • Removed invalid values (Quantity ≤ 0, UnitPrice ≤ 0)
  • Removed outliers using the IQR method
  • Standardized text fields (Description, Country)

Feature engineering:

  • Created TotalPrice = Quantity × UnitPrice

📊 RFM Analysis

RFM variables:

  • Recency → Number of days since the last transaction
  • Frequency → Number of unique invoices
  • Monetary → Total spending

These variables represent customer purchasing behavior and are used as input for clustering.


🤖 Customer Segmentation

  • Algorithm: K-Means Clustering
  • Input features: Scaled RFM variables
  • Visualization: PCA (2D scatter plot)
  • Output: Cluster label for each customer

Cluster profiling is performed using the average RFM values for each cluster.


🛍 Market Basket Analysis

  • Algorithm: Apriori
  • Metrics used:
    • Support
    • Confidence
    • Lift
  • Output:
    • Association rules such as:
      {Product A} → {Product B}

These rules can be used to support product recommendation and bundling strategies.


🖥 Dashboard Features

  • Interactive selection of number of clusters (k)
  • PCA scatter plot for cluster visualization
  • Bar chart for cluster profiling
  • Association rules table

🛠 Tech Stack

  • Python
  • Pandas, NumPy
  • Scikit-learn
  • Mlxtend
  • Matplotlib
  • Streamlit

🚀 How to Run

  1. Clone the repository:
git clone https://github.com/yourusername/retail-customer-analytics.git
cd retail-customer-analytics
  1. Install dependencies:
pip install -r requirements.txt
  1. Project structure:
retail-customer-analytics/
│
├── app.py
├── requirements.txt
├── README.md
└── data/
    └── online_retail.csv
  1. Run the Streamlit app:
streamlit run app.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages