This project builds a high-performance fraud detection system using advanced feature engineering, SMOTE balancing, and machine learning models to identify fraudulent financial transactions.
The goal is to accurately classify transactions as fraud or non-fraud while minimizing financial risk.
To detect fraudulent transactions by analyzing:
- Transaction type
- Transaction amount
- Account balance changes
- Engineered balance-difference features
- Python
- Pandas
- NumPy
- Scikit-learn
- XGBoost
- SMOTE (Imbalanced Learning)
- Matplotlib & Seaborn
- SHAP (Model Explainability)
- Exploratory Data Analysis (EDA)
- Feature Engineering
- org_diff
- dest_diff
- amount_log
- zero balance indicators
- one-hot encoding
- Handling Class Imbalance using SMOTE
- Train-Test Split (Stratified 80/20)
- Model Training:
- Logistic Regression
- Random Forest
- XGBoost
- Model Evaluation:
- Precision
- Recall
- F1-Score
- ROC-AUC
- PR-AUC
- SHAP Explainability
- Precision: 99.13%
- Recall: 99.56%
- F1-Score: 99.34%
- ROC-AUC: 99.99%
- PR-AUC: 99.95%
XGBoost achieved the highest recall and PR-AUC, making it the most effective fraud detection model.
- Origin balance difference (org_diff)
- Log-transformed transaction amount
- Zero-balance behavior
- CASH_OUT & TRANSFER transaction types
- Abnormal balance drops
- Reduces financial losses
- Improves fraud detection recall
- Enables real-time risk scoring
- Supports scalable fraud prevention systems
- Clone repository
- Install dependencies
- Run Jupyter notebook
Subodh Kumar
Machine Learning & Data Science Enthusiast