Skip to content
View ngwanelegacie's full-sized avatar

Block or report ngwanelegacie

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ngwanelegacie/README.md

Typing SVG


About Me

I turn messy datasets into decisions - and I explain them so anyone can follow.

I'm Vusumuzi Nkosi, a data scientist with postgraduate training in Data Science (STADIO, NQF 8) and 9+ years as a mathematics lecturer. That teaching background isn't a footnote - it's the thing that makes me different. I don't just build models; I communicate what they mean.

I hold a PGDip in TVET from the University of the Western Cape and am an ALX Data Science graduate. I work at the intersection of data science and finance, building end-to-end pipelines on real-world data.

Quick Facts

  • Education: PGDip Data Science @ STADIO (NQF 8) | PGDip TVET @ UWC | ALX DS Graduate | BSc Maths & CS @ Unisa
  • Location: Newcastle, KZN, South Africa
  • Focus: Credit Risk | Financial Analytics | Machine Learning
  • Open to: Data Scientist, Data Analyst, Credit Risk Analyst roles (SA / remote)

Tech Stack

Languages & Core

Python SQL R Excel

ML & Data Science

Scikit-learn XGBoost Pandas NumPy

Visualisation & BI

Power BI Tableau Matplotlib Seaborn Plotly

Tools

Jupyter Git VS Code Streamlit AWS SageMaker


Featured Projects

1. Lending Club Loan Default Analysis

2.26M real loans | AUC 0.72 | ~$95M annual impact

End-to-end credit-risk pipeline on real Lending Club data (2007-2018). SQL schema design, feature reduction from 151 to 33 variables, full EDA, Gradient Boosting model evaluation, and ~$95M/year in quantified preventable losses. Includes Excel dashboards and a deployed Streamlit prediction app.

Metric Value
Dataset 2,260,701 loans
Best model Gradient Boosting (AUC 0.72)
Top feature sub_grade (23.3% importance)
Business impact ~$95M annual loss prevention

Tech: Python SQL scikit-learn Streamlit Excel

View Project Live Demo


2. SA Financial Indicators Dashboard

World Bank data | Python ETL | SQL + Power BI + Excel

Financial indicators dashboard built from World Bank data on South African economic metrics. Python ETL scripts for data collection and cleaning, SQL for structured storage, and Excel/Power BI reporting views.

Tech: Python SQL Power BI Excel

View Project


Education & Certifications

Qualification Institution Status
PGDip in Data Science (NQF 8) STADIO Higher Education In Progress (2026)
ALX Data Science Programme ALX Africa Completed (2025-2026)
PGDip in TVET University of the Western Cape Completed (2024-2025)
BSc Mathematics & Computer Science University of South Africa Completed (2022-2024)
BEd Mathematics & Computer Science University of South Africa Completed (2012-2017)
Certification Issuer
Data Science Certificate ALX Africa
Data Analyst Certificate ALX Africa
AI Engineering Bootcamp Zero To Mastery Academy
Prompt Engineering Bootcamp Zero To Mastery Academy

GitHub Stats


Connect

Portfolio Email LinkedIn


Visitor Badge

Pinned Loading

  1. lending-credit-analysis lending-credit-analysis Public

    End-to-end credit risk analysis on 2.26M Lending Club loans - SQL, Python EDA, ML default prediction (AUC 0.72), and ~$95M annual impact quantification.

    Jupyter Notebook

  2. sa-financial-dashboard-repo sa-financial-dashboard-repo Public

    Python