- Shahrukh Islam Prithibi
- Sophie Yang
- Yovindu Don
- Jade Bouchard
The goal of our analysis is to classify whether someone is a good or bad credit risk using attributes such as Credit History, Duration, and Residence. Our best-performing model is a Random Forest model. This model gave us an accuracy of 0.8 on unseen data, a decent result compared to the dummy model's accuracy of 0.7. We also obtained a precision score of 0.8, a recall score of 0.95, and F1 Score of 0.87. Our model performs decently well in terms of identifying people who are a good credit risk. However, if this model is to have a hand in real-world decision-making, precision should be improved to minimize classifying poor credit risks as good credit risks (false positives). In addition, more research should be done to ensure the model produces fair and equitable recommendations.
The Statlog (German Credit Data) dataset, sourced from this UCI’s Machine Learning Repository, used for classifying individuals as good or bad credit risks based on a variety of attributes. A cost matrix is required for evaluation, where misclassification costs are outlined. The cost matrix indicates that it is worse to classify a customer as good when they are bad, compared to classifying a customer as bad when they are good. The dataset contains 1000 instances with 20 features. Each feature has a different role, type, and demographic information.
The final report can be found here
Build and run the project using Docker by following these steps:
Setup
- First, ensure you have Docker installed and running on your machine.
- Clone this repository, and navigate to the root of the repository in a terminal window.
Choose one of the following two options for launching the Docker container.
The preferred method to run the Docker container is to use docker-compose. Run the following command in the terminal to build and start the container. This command activates the commands specified in docker-compose.yml.
docker-compose upCheck the Developer Notes section of this README for details on how to run our analysis.
Stop the Docker container by first typing Cntrl + Cin the terminal where you launched the container, and then run the following command:
docker-compose rm
Another method of running the docker container is by executing the following commands:
Build the Docker image (optional):
docker build -t yovindu/project --platform=linux/amd64 .Run the Docker container:
docker run -it --rm -p 8888:8888 -v /"$(pwd)":/home/jovyan --platform=linux/amd64 yovindu/projectCheck the Developer Notes section of this README for details on how to run our analysis.
In order to exit the container type Cntrl + C in the terminal where you launched the container.
(Below instructions copied form this repository)
After launching the Docker Container, in the terminal look for a URL that starts with http://127.0.0.1:8888/lab?token= . Copy and paste that URL into your browser.
You should now see the Jupyter lab IDE in your browser, with all the project files visible in the file browser pane on the left side of the screen.
Note if you prefer to work in VS Code, you can run the following from the root of the project in a terminal in VS Code to launch the container in the terminal there:
docker compose run --rm myapp bash
To exit the container type exit in the terminal.
Open a terminal at the project root in Jupyter or VSCode. Use the command
make clean-allto reset the project to a clean state (i.e., remove all files generated by previous runs of the analysis).
To run the analysis in its entirety, enter the command
make allin the terminal in the project root.
Docker is a container solution used to manage the software dependencies for this project. The Docker image used for this project is based on the quay.io/jupyter/scipy-notebook:2024-02-24 image. Additional dependencies are specified int the Dockerfile.
Tests are run using the pytest command in the root of the project. Run
pytest tests/*
The Credit Risk Analysis report contained herein is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License. See the license file for more information. If re-using/re-mixing please provide attribution and link to this webpage. The software code contained within this repository is licensed under the MIT license. See the license file for more information.
Costa e Silva, E., Lopes, I. C., Correia, A., & Faria, S. (2020). A logistic regression model for consumer default risk. Journal of Applied Statistics, 47(13-15), 2879–2894. https://doi.org/10.1080/02664763.2020.1759030
Dobby, C., & Vossos, T. (2024, February 22). Wall Street to Follow Canada’s Hot Risk Transfer Trade. Bloomberg.com. https://www.bloomberg.com/news/articles/2024-02-22/wall-street-to-follow-canada-s-hot-capital-relief-trade
Goraieb, E., Kumar, S., & Pepanides, T. (n.d.). Credit Risk | Risk & Resilience | McKinsey & Company. https://www.mckinsey.com/capabilities/risk-and-resilience/how-we-help-clients/credit-risk
Personal characteristics, grounds of discrimination protected in the BC Human Rights Code - BC Human Rights Tribunal. (2023, May 9). BC Human Rights Tribunal. https://www.bchrt.bc.ca/human-rights-duties/personal-characteritics/