This project focuses on detecting fraudulent credit card transactions in a highly imbalanced dataset.
Develop a machine learning model that accurately classifies fraud cases from legitimate transactions using anonymized data.
Dataset used: Kaggle Credit Card Fraud Dataset
Contains 284,807 transactions with 492 fraud cases (~0.17%)
- Preprocessing: Dropped 'Time', scaled 'Amount'
- Handled class imbalance using SMOTE and ADASYN
- Models: Logistic Regression, Random Forest, XGBoost
- Cross-validation with stratified folds
- ROC-AUC used as primary metric
- Best ROC-AUC: ~0.977 (XGBoost + threshold tuning)
- Balanced precision/recall with custom threshold
Python, Pandas, Scikit-learn, XGBoost, Imbalanced-learn, Matplotlib, Seaborn