Skip to content
View Kingsley-Eboh's full-sized avatar
☺️
☺️

Block or report Kingsley-Eboh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Kingsley-Eboh/README.md

Kingsley Eboh

Data Analyst

Sheffield, UK  |  kingsley.eboh49@gmail.com  |  linkedin.com/in/kingsleyeboh


About Me

I am a data analyst with seven years of experience working with complex datasets in environments where accuracy, governance and the reliability of analytical outputs directly affect the quality of decisions made. My work has consistently involved large scale data analysis, data quality assessment, root cause investigation, governance documentation and the communication of complex findings to both technical and non-technical audiences.

I came to data analytics through years of working in regulated environments where a single data error has direct consequences for real people. That background shapes everything about how I approach analytical work today.

I have worked in environments where data errors have real consequences, where audit readiness is a daily requirement and where the difference between a finding and a conclusion matters. That experience shapes how I approach every analytical problem regardless of domain or industry.

The projects in this portfolio apply that experience to large scale real world datasets using Python, SQL, Power BI and machine learning. They are self-directed, independently built and fully documented, evidence of a standard I hold myself to regardless of whether anyone is watching.


What Makes Me Different

I ask whether my conclusion actually follows from my evidence. I think about who will read my findings and what they need to understand before I decide how to present them.

I care about the person at the end of every analysis I carry out and the impact that analysis will have on the decision they are about to make. That keeps me honest about what I can and cannot claim, careful about how I communicate uncertainty and clear about the difference between what the data shows and what I think it means.

I do not need to be managed toward finishing something or reminded that accuracy matters. Both are reflections of a standard I hold myself to regardless of whether anyone is watching.

I apply the same care to how I handle data. I am mindful of access boundaries, protective of data integrity throughout the analytical process and aware that behind every dataset there are real people whose information has been entrusted to the organisations that hold it.

I have worked through problems alone and contributed to team efforts in professional settings where collaboration and clear communication produced better outcomes than either person would have reached independently. I know when to ask for input and when to get on with it.


Core Skills

Data Analysis and Engineering Extracting, transforming and analysing large datasets using Python and SQL. Building end to end analytical pipelines from raw data sources to dashboard delivery. Identifying patterns, trends and anomalies in complex multi-variable datasets across regulated and operational environments.

Machine Learning and Predictive Analytics Building, tuning and evaluating classification models using scikit-learn and XGBoost. Applying SHAP value analysis to produce transparent and interpretable model explanations suitable for both technical and non-technical audiences. Handling class imbalance, hyperparameter optimisation and threshold tuning in business-relevant prediction problems.

Statistical Analysis and Signal Detection Applying statistical signal detection methods to identify anomalies and elevated patterns in large datasets. Cross-validating findings across Python and SQL to ensure analytical integrity. Presenting findings with appropriate uncertainty and analytical hedging.

Data Quality and Governance Assessing, validating and documenting data quality across complex operational datasets. Maintaining audit ready processes and traceable analytical outputs. Applying governance frameworks in regulated and complex data environments.

Data Visualisation and Reporting Building interactive Power BI dashboards designed and formatted for presentation to senior and board level audiences. Producing publication quality charts using matplotlib and seaborn. Writing analytical findings in plain language accessible to non-technical stakeholders.

Database and SQL Designing and querying relational databases in PostgreSQL. Writing analytical SQL including window functions, CTEs and aggregations. Loading, transforming and validating data between Python and SQL environments.


Portfolio Projects

Bank Customer Churn Prediction The business question: Which retail banking customers are most likely to close their accounts, and what factors are driving that decision?

10,000 customer records analysed across demographic, behavioural and product usage dimensions. XGBoost model tuned to 86.6% ROC-AUC with SHAP value analysis identifying Age, Number of Products and Member Activity Status as the three strongest predictors of churn. Delivered findings across 13 Python analyses, 12 SQL queries and a 2 page Power BI dashboard.

Outcome: Germany recorded the highest churn rate at 32.4%, nearly double France at 16.2%. Dormant accounts churned at 47.1%, more than double the overall rate of 20.4%. Customers holding 3 or more products churned at 82.7% and 100.0% respectively, signalling serious product concentration risk. High value customers in Germany churned at 29.0% with average balances exceeding £149,000. Model correctly predicted churn for all top 20 highest risk customers in the test set with 100% precision. Retaining just 20% of at-risk customers across all risk tiers would preserve over £30 million in customer balances.

Transferable value: Predictive modelling, customer segmentation, churn analysis, machine learning explainability, financial services analytics, retention strategy support.

Stack: Python · XGBoost · scikit-learn · SHAP · PostgreSQL · Power BI · SQL · Jupyter · pandas

View Project


Clinical Trials Analysis The business question: What do 10 years of clinical trial registrations reveal about pipeline efficiency, phase attrition and therapeutic area trends across major pharmaceutical sponsors?

164,487 interventional trial records retrieved from the ClinicalTrials.gov API covering January 2015 to December 2024, representing the complete available population of trials meeting the search criteria. Delivered findings across 13 Python analyses, 10 SQL queries and a 2 page Power BI dashboard.

Outcome: Phase 2 to Phase 3 identified as the highest attrition point in the pipeline at 54.8% transition rate. Oncology recorded the highest termination rate at 17.7% and the highest overall attrition rate at 25.1%. Trial registrations peaked at 18,748 in 2021 and declined by 20.6% in 2024. Psychiatry and Mental Health identified as the second largest therapeutic area with 11,147 trials and among the lowest attrition rates, a significant and underinvested area of clinical activity.

Transferable value: Life sciences analytics, pipeline analysis, sponsor performance benchmarking, therapeutic area trending, regulatory data engineering, API data retrieval.

Stack: Python · PostgreSQL · Power BI · SQL · Jupyter · pandas · REST API

View Project


NHS Referral to Treatment: Performance Analysis The business question: Is the NHS meeting its 18-week constitutional standard and where are the greatest performance pressures?

11 months of national RTT data covering 515 NHS trusts analysed across 23 treatment specialties. Delivered findings across 15 Python analyses, 10 SQL queries and a 2 page Power BI dashboard.

Outcome: NHS missed the 92% standard in every reporting period. Waiting list reduced from 7.42 million to 7.16 million patients. Oral Surgery identified as worst performing specialty at 51.5%.

Transferable value: Performance benchmarking, KPI monitoring, trend analysis, operational reporting, large dataset handling.

Stack: Python · PostgreSQL · Power BI · SQL · Jupyter · pandas

View Project


NHS A&E: Emergency Care Analysis The business question: What is the scale of A&E performance failure and how does winter pressure affect emergency care capacity?

Full year A&E data across 200 NHS providers analysed across 8 dimensions including seasonal variation, provider benchmarking and regional comparison.

Outcome: 26.9 million attendances recorded. 4-hour breach rate 39.4%, nearly double the NHS target. 570,931 patients waited 12 or more hours before being admitted to hospital.

Transferable value: Seasonal analysis, provider benchmarking, capacity planning, operational performance reporting.

Stack: Python · PostgreSQL · Power BI · SQL · Jupyter · pandas · matplotlib · seaborn

View Project


FDA Pharmacovigilance: Signal Detection Analysis The business question: Which drugs in the FDA adverse event database carry the highest safety risk and what specific reactions are statistically elevated?

6,000 adverse event reports retrieved via API across five pharmaceutical products. Applied statistical signal detection methodology to identify drug reaction combinations reported more frequently than expected by chance. Delivered findings across 15 Python analyses, 15 SQL queries and a 3 page Power BI dashboard.

Outcome: Identified Ibuprofen as carrying the highest mortality signal at 20.50% death rate. Detected Drug withdrawal syndrome in Paracetamol combination products at a signal strength of 777, meaning it was reported 777 times more frequently than expected by chance. Confirmed Metformin Lactic acidosis signal consistent with clinical literature.

Transferable value: Signal detection, anomaly identification, regulatory data analysis, API data engineering, executive dashboard delivery.

Stack: Python · PostgreSQL · Power BI · SQL · Jupyter · REST API

View Project


Enterprise Security Detection Lab The business question: Can enterprise authentication attacks be reliably detected using Windows Security event logging?

This project demonstrates structured analytical thinking, audit log analysis and pattern detection applied to security event data, skills that transfer directly to any environment where data integrity, access governance and anomaly identification matter.

Production-modelled Active Directory environment built on Windows Server 2022. Simulated brute-force, privilege escalation and authentication abuse scenarios. Validated detection across 9 Windows Security event captures.

Outcome: Successfully detected all simulated attack scenarios. Validated detection coverage across authentication, privilege and process execution event categories.

Transferable value: Audit log analysis, structured event data validation, pattern detection in large log datasets, enterprise infrastructure understanding, data security awareness.

Stack: Windows Server 2022 · Active Directory · PowerShell · VirtualBox

View Project


Certifications

  • CompTIA Security+
  • Google Cybersecurity Certificate
  • AWS Cloud Practitioner Essentials · Amazon Web Services

How I Work

I approach every analysis the same way regardless of domain or industry.

  1. Start with the business question: What decision does this analysis need to support?
  2. Understand the data: Assess completeness, quality and limitations before drawing conclusions.
  3. Apply appropriate methodology: Match the analytical approach to the question being asked.
  4. Validate findings: Cross-check results across different tools and approaches.
  5. Communicate clearly: Present findings in language a non-technical stakeholder can act on.

This workflow has been applied consistently across seven years of professional analytical work and across every project in this portfolio.


Currently Building

Currently Building

Sales Forecasting — Retail Predicting future sales volumes using historical transaction data to support inventory planning and revenue forecasting decisions.

Credit Card Fraud Detection — Finance Identifying fraudulent transactions using anomaly detection and classification methods on imbalanced financial data.

Pinned Loading

  1. fda-adverse-events-analysis fda-adverse-events-analysis Public

    This project applies pharmacovigilance signal detection to 6,000 FDA adverse event reports across five pharmaceutical products to identify elevated drug reaction signals, demographic risk profiles …

    Jupyter Notebook

  2. nhs-rtt-analysis nhs-rtt-analysis Public

    Exploratory data analysis of NHS RTT waiting times across the 2025/26 financial year

    Jupyter Notebook

  3. nhs-trend-analysis nhs-trend-analysis Public

    This project analyses a full financial year of NHS A&E attendances and emergency admissions across all providers in England to identify performance trends, seasonal pressures and provider level var…

    Jupyter Notebook

  4. clinical-trials-analysis clinical-trials-analysis Public

    This project analyses ten years of clinical trial activity across major pharmaceutical sponsors to identify pipeline efficiency, phase attrition and therapeutic area trends. It delivers strategic i…

    Jupyter Notebook

  5. bank-churn-prediction bank-churn-prediction Public

    This project predicts which retail banking customers are most likely to churn and why, enabling targeted retention before account closure. It identifies high-risk demographic and behavioural segmen…

    Jupyter Notebook