A full-fledged fraud detection pipeline using machine learning and AI interpretability tools. This project includes data exploration, feature engineering, model selection (with XGBoost), visual explanations (SHAP), a Flask API, and a Streamlit frontend. We also integrated Google’s Gemini AI for advanced prediction interpretation.
- Source:
synthetic_fraud_dataset.csv - We conducted Exploratory Data Analysis (EDA) using
ydata-profilingand custom visualizations to understand:- Feature distributions
- Correlation heatmaps
- Class imbalance
- Potential data quality issues
The EDA helped inform downstream preprocessing and feature engineering.
We selected and transformed features to improve model accuracy and reduce noise.
Account_Balance,IP_Address_Flag,Previous_Fraudulent_ActivityDaily_Transaction_Count,Risk_Score,Is_Weekend,Hour,Amount_Deviation, etc.
- Identifiers like
Transaction_ID,User_ID - Raw
Timestamp,Device_Type,Location(transformed into derived features) - Low-informative fields like
Card_Type,Authentication_Method
We scaled and normalized the final dataset, saving it as preprocessed_fraud_dataset_numerical_only.csv.
We tested multiple classifiers:
- ✅ Logistic Regression
- ✅ Random Forest
- ✅ Gradient Boosting
- ✅ XGBoost (Best Performance)
After selecting XGBoost, we fine-tuned hyperparameters and saved the trained pipeline to fraud_detection_model_numerical.pkl.
We built utility functions to:
- Predict fraud likelihood on new transactions
- Show evaluation metrics:
- Confusion Matrix
- Precision-Recall Curve
- Classification Report
- ROC-AUC Score
- Visualize feature importances and interpret predictions with SHAP:
=== TRANSACTION DETAILS ===
Transaction_Amount: 1500
...
Amount_Deviation: 1300
=== PREDICTION ===
Prediction: FRAUD
Fraud Probability: 0.82Visuals include:
- SHAP summary bar plot
- Force plots for individual transaction explanations
We exposed the model via a REST API using Flask.
| Method | Endpoint | Description |
|---|---|---|
| POST | /predict |
Predict using XGBoost model |
| POST | /ai-prediction |
Get prediction + explanation via Gemini |
| GET | /health |
API health check |
Example AI Prompt (Gemini):
"You are an AI fraud detection assistant. Based on the following transaction data..."Uses gemini-2.5-pro-exp-03-25 for rich explanations alongside predictions.
We developed a simple, responsive Streamlit app to interact with the API:
- Enter transaction details manually
- Get instant predictions
- View explanations (via AI or SHAP)
- Toggle light/dark theme
bash app/start.sh- Start the Flask API:
python app/api.py- Launch the Streamlit UI:
streamlit run app/frontend.py- Python (Pandas, Scikit-learn, XGBoost, SHAP)
- Flask for backend API
- Streamlit for UI
- Google Gemini AI for natural language explanations
- Matplotlib & Seaborn for visualization
- Joblib for model serialization
- Add authentication & rate limiting to API
- Integrate live transaction feeds for real-time fraud scoring
- Add model monitoring dashboard (e.g., with Grafana + Prometheus)
- Expand AI explanation options using multiple LLMs
Feel free to open issues, fork the repo, or create pull requests. Your feedback and ideas are welcome!
MIT License © 2025