End-to-end Machine Learning pipeline for detecting fraudulent credit card transactions
Built with focus on imbalanced data handling, production-grade practices, and deployable API.
- Complete ML pipeline: from raw data to live prediction API
- Effective handling of extreme class imbalance (~0.17% fraud cases)
- Threshold tuning for optimal precision-recall trade-off in production
- Stratified cross-validation + model comparison
- FastAPI inference service
- Explainability support feature importance
Best model: XGBoost
Achieved excellent fraud-class performance after threshold optimization.
Credit Card Fraud Detection (Anonymized transactions – September 2013)
→ Source: Kaggle – Credit Card Fraud Detection
Features:
Time– seconds since first transactionV1–V28– PCA-transformed featuresAmount– transaction amountClass– target (0 = normal, 1 = fraud)
Key statistics:
- Total transactions: 284,807
- Fraud cases: 492 (~0.172% – highly imbalanced)
- Go to: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
- Sign in / create a free Kaggle account
- Click Download (file:
creditcard.csv~150 MB) - Place the file in the
data/folder (create if needed)Note: The dataset is not included in this repository due to size and licensing.
fraud_detection/
├── data/ # Put creditcard.csv here
├── notebooks/
│ └── 01_eda.ipynb # EDA
├── src/
│ ├── preprocessing.py
│ ├── models.py
│ ├── train.py
│ ├── predict.py
│ ├── evaluation.py
│ ├── interpret.py
│ └── api.py
├── models/
│ ├── best_model.pkl
│ ├── best_scaler.pkl
│ └── best_threshold.pkl
├── requirements.txt
├── .gitignore
└── README.md
- Installation
# Clone repository
git clone https://github.com/YOUR-USERNAME/credit-card-fraud-detection.git
cd credit-card-fraud-detection
# Create & activate virtual environment
python -m venv venv
source venv/bin/activate # Linux / macOS
# or
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
- Train the model
cd src
python train.py
- Run API
uvicorn src.api:app --reload --port 8000
→ Open: http://127.0.0.1:8000/docs
| Metric | Value |
|---|---|
| Precision (Fraud) | 0.96 |
| Recall (Fraud) | 0.78 |
| PR-AUC | 0.84 |
MIT License – feel free to use this project for learning purposes.
