Skip to content

JashanLabs/Credit-Card-Default

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Credit Risk Modeling

Overview

This project builds a credit default prediction system that estimates the probability of default for customers using historical repayment and financial behavior.

The focus is not just classification, but:

  • Ranking users by risk
  • Maximizing recall of defaulters
  • Producing calibrated probability estimates for decision-making

Problem Statement

Given customer credit history, predict the likelihood of default in the next month.

This is a risk-sensitive problem where:

  • Missing a defaulter → financial loss
  • False positives → operational cost

Therefore, the system prioritizes:

  • High recall for defaulters
  • Efficient identification of high-risk users

Dataset

The dataset contains:

  • PAY_X: Repayment status (delay behavior)
  • BILL_AMT_X: Monthly bill amounts (debt)
  • PAY_AMT_X: Payments made

Target:

  • default (1 = default, 0 = non-default)

The dataset is imbalanced, making accuracy an unreliable metric.


Approach

1. Feature Engineering

Constructed behavior-driven features:

  • recent_delay → recent repayment severity
  • delay_trend → worsening payment behavior
  • utilization → credit usage ratio
  • pay_ratio → payment discipline

2. Models Used

  • Logistic Regression (baseline, interpretable)
  • XGBoost (final model, non-linear patterns)

3. Threshold Optimization

Instead of default 0.5 threshold:

  • Lower threshold → higher recall
  • Selected threshold based on business trade-off

4. Model Evaluation

Focused on:

  • Recall (defaulters)
  • Precision
  • Confusion matrix

Accuracy was not used as a primary metric.


Key Results

  • Top 10% users capture ~68% of total defaulters
  • Recall improved to ~80%
  • XGBoost achieved higher precision at similar recall compared to logistic regression
  • Model effectively ranks users by risk

Calibration

Initial model showed overconfidence in probability estimates.

Applied isotonic calibration:

  • Improved probability reliability
  • Did not affect ranking performance

Key Insights

  • Repayment delay is the strongest predictor of default
  • Behavioral signals outperform static financial metrics
  • Model is fragile without key delay features (validated via ablation)
  • Ranking users is more valuable than binary classification

Business Impact

The model enables:

  • Prioritization of high-risk customers
  • Efficient allocation of risk management resources
  • Estimation of expected default rates

Example:

  • Top 10% high-risk users → 68% of defaulters
    → Significant improvement over random targeting

Tech Stack

  • Python
  • Pandas, NumPy
  • Scikit-learn
  • XGBoost
  • Matplotlib / Seaborn

Project Structure

├── notebook.ipynb       # Full analysis and modeling
├── credit_model.pkl     # Saved calibrated model
├── README.md
├── requirements.txt

How to Run

git clone <repo_url>
cd <repo>
pip install -r requirements.txt
jupyter notebook

Future Improvements

  • Add more temporal features
  • Improve robustness beyond single dominant feature
  • Deploy as real-time scoring API
  • Integrate with live financial data

Conclusion

This project demonstrates a real-world credit risk modeling pipeline, including:

  • Feature engineering from behavioral data
  • Model comparison and threshold tuning
  • Risk-based ranking
  • Probability calibration

The system functions as an early warning mechanism for identifying high-risk customers.

About

Credit default prediction system with ~80% recall and 68% of defaulters captured in top 10% risk segment. Built with XGBoost, calibrated probabilities, and deployed on Streamlit.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors