🚀 Predictive Insurance Pricing System

An end-to-end Machine Learning project that predicts medical insurance charges using statistical analysis, feature engineering, and Linear Regression.

The application is deployed using Streamlit and allows users to estimate insurance charges in real time based on healthcare and financial attributes.

🌐 Live Demo

https://predictive-insurance-pricing-system-qnmpk5cxusutnndxyfsknk.streamlit.app/

📌 Problem Statement

Insurance providers must accurately estimate medical insurance charges based on various customer attributes such as claim amount, hospital expenditure, annual salary, smoking habits, and family details.

The objective of this project is to build a predictive system capable of estimating insurance charges using historical healthcare-related data and machine learning techniques.

🎯 Project Objectives

Analyze healthcare and insurance-related data
Identify significant factors influencing insurance charges
Build a predictive regression model
Evaluate model performance and reliability
Deploy the model for real-time predictions

📊 Exploratory Data Analysis (EDA)

Performed comprehensive exploratory data analysis to understand:

Data distributions
Missing values
Feature relationships
Outlier behavior
Correlation patterns

🧹 Data Cleaning & Preprocessing

Missing Value Handling

Applied:

Mean Imputation
Median Imputation
Mode Imputation

Analyzed skewness and used Median Imputation for skewed features to maintain data stability.

📈 Outlier Detection & Treatment

Implemented:

Box Plot Analysis

Interquartile Range (IQR) Method

Used IQR analysis to identify and handle extreme values that could negatively impact model performance.

🔍 Correlation & Multicollinearity Analysis

Performed:

Correlation Matrix Analysis

to understand linear relationships among variables.

Applied:

Variance Inflation Factor (VIF)

to detect and reduce multicollinearity.

📊 Statistical Hypothesis Testing

Conducted:

T-Test
ANOVA

These statistical tests helped identify significant features contributing to insurance charge prediction.

⚙️ Feature Engineering

Implemented:

Feature Selection
Feature Encoding
Data Transformation
Feature Scaling (Standardization)

Created an optimized preprocessing pipeline for model training.

🤖 Model Building

Algorithm Used:

Linear Regression

Workflow:

Data Preprocessing
Train-Test Split
Model Training
Model Testing
Model Evaluation
Bias-Variance Assessment

📉 Model Evaluation

Evaluated model performance using:

Regression Metrics
Prediction Analysis
Bias-Variance Assessment

These evaluations helped validate the model's predictive capability.

🚀 Deployment

The trained model was:

Saved using Joblib
Integrated with Streamlit
Deployed using GitHub and Streamlit Cloud

Users can provide input values and receive real-time insurance charge predictions in:

USD
INR

🛠️ Technologies Used

Python
Pandas
NumPy
Scikit-Learn
Statistical Analysis
T-Test
ANOVA
VIF
Linear Regression
Joblib
Streamlit
GitHub

📁 Project Structure

Predictive-Insurance-Pricing-System/
│
├── app.py
├── linear_regression_model.joblib
├── scaler.pkl
├── new_insurance_data.csv
├── Processed_Insurance_data.csv
├── requirements.txt
└── README.md

📂 Dataset Files

new_insurance_data.csv

Raw insurance dataset used for data exploration, preprocessing, and feature analysis.

Processed_Insurance_data.csv

Cleaned and preprocessed dataset after handling missing values, outliers, feature engineering, and data transformations used for model development.

💡 Key Learnings

Practical application of statistical analysis in machine learning
Missing value treatment strategies
Outlier detection using IQR
Multicollinearity analysis using VIF
Feature engineering and scaling
Linear Regression model development
Model deployment using Streamlit
End-to-end machine learning workflow

👨‍💻 Author

Daniel J

LinkedIn: https://www.linkedin.com/in/daniel-j77

GitHub: https://github.com/daniel-j77

⭐ Future Improvements

Ensemble Regression Models
XGBoost Regression
Random Forest Regression
Advanced Feature Selection Techniques
Cloud-Based Deployment
Model Monitoring and Performance Tracking

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Predictive Insurance Pricing System

🌐 Live Demo

📌 Problem Statement

🎯 Project Objectives

📊 Exploratory Data Analysis (EDA)

🧹 Data Cleaning & Preprocessing

Missing Value Handling

📈 Outlier Detection & Treatment

Box Plot Analysis

Interquartile Range (IQR) Method

🔍 Correlation & Multicollinearity Analysis

Correlation Matrix Analysis

Variance Inflation Factor (VIF)

📊 Statistical Hypothesis Testing

⚙️ Feature Engineering

🤖 Model Building

Linear Regression

📉 Model Evaluation

🚀 Deployment

🛠️ Technologies Used

📁 Project Structure

📂 Dataset Files

new_insurance_data.csv

Processed_Insurance_data.csv

💡 Key Learnings

👨‍💻 Author

⭐ Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Processed_Insurance_data.csv		Processed_Insurance_data.csv
README.md		README.md
app.py		app.py
linear_regression_model.joblib		linear_regression_model.joblib
new_insurance_data.csv		new_insurance_data.csv
requirements.txt		requirements.txt
scaler.pkl		scaler.pkl

Folders and files

Latest commit

History

Repository files navigation

🚀 Predictive Insurance Pricing System

🌐 Live Demo

📌 Problem Statement

🎯 Project Objectives

📊 Exploratory Data Analysis (EDA)

🧹 Data Cleaning & Preprocessing

Missing Value Handling

📈 Outlier Detection & Treatment

Box Plot Analysis

Interquartile Range (IQR) Method

🔍 Correlation & Multicollinearity Analysis

Correlation Matrix Analysis

Variance Inflation Factor (VIF)

📊 Statistical Hypothesis Testing

⚙️ Feature Engineering

🤖 Model Building

Linear Regression

📉 Model Evaluation

🚀 Deployment

🛠️ Technologies Used

📁 Project Structure

📂 Dataset Files

new_insurance_data.csv

Processed_Insurance_data.csv

💡 Key Learnings

👨‍💻 Author

⭐ Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages