Credit card fraud is a significant issue in the financial industry, leading to massive financial losses every year. The goal of this project is to build a fraud detection model that can distinguish between fraudulent and non-fraudulent transactions using machine learning.
The dataset is highly imbalanced, with fraudulent transactions making up less than 0.2% of all transactions. To tackle this, techniques such as SMOTE (Synthetic Minority Oversampling Technique) and feature scaling have been applied. Multiple models were trained and evaluated to identify the most effective fraud detection method.
The dataset used in this project is the well-known Kaggle Credit Card Fraud Detection dataset.
- π Transactions: 284,807
- π³ Fraudulent transactions: 492 (0.17%)
- π Features: 30 (28 anonymized PCA components,
Time, andAmount) - π― Target:
Class(0 = Non-Fraud, 1 = Fraud)
Dataset link: Credit Card Fraud Detection - Kaggle
-
Exploratory Data Analysis (EDA)
- Checked class imbalance.
- Distribution of transaction
AmountandTime. - Correlation heatmap of features.
-
Data Preprocessing
- Feature scaling using StandardScaler.
- Oversampling minority class using SMOTE.
-
Model Training
-
Models applied:
- Logistic Regression
- Random Forest
- XGBoost
-
-
Evaluation Metrics
- Precision
- Recall
- F1-Score
- ROC-AUC Score
- Confusion Matrix
| Model | Precision | Recall | F1-Score | ROC-AUC |
|---|---|---|---|---|
| LogisticRegression | 0.13 | 0.89 | 0.23 | 0.97 |
| RandomForest | 0.83 | 0.82 | 0.83 | 0.96 |
| XGBoost | 0.80 | 0.85 | 0.83 | 0.98 |
βοΈ XGBoost performed best overall with the highest ROC-AUC score.
- Use deep learning models such as Autoencoders for anomaly detection.
- Implement a real-time fraud detection pipeline.
- Apply hyperparameter tuning for model optimization.
- Deploy the model as a Flask/Django web app or using Streamlit.
If you have any questions or suggestions, feel free to connect!