- A Machine Learning based web application that predicts house prices based on different features like location, rooms, income and population.
-
This project uses Data Analysis, Feature Engineering and Machine Learning to predict house prices.
-
The model is trained on the California Housing Dataset and deployed using Streamlit for real-time predictions.
-
Python
-
Pandas
-
NumPy
-
Seaborn
-
Matplotlib
-
Scikit-learn
-
Streamlit
-
Joblib
-
Linear Regression
-
Random Forest Regressor
-
GridSearchCV (Hyperparameter Tuning)
-
Dataset: California Housing Dataset
-
Problem Type: Regression
-
Target Variable: median_house_value
-
Checked dataset info
-
Handled missing values using dropna()
-
Removed null records
-
total_rooms
-
total_bedrooms
-
population
-
households
-
bedroom_ratio = bedrooms / rooms
-
household_rooms = rooms / households
- ocean_proximity
-
Histogram plots for feature distribution
-
Correlation heatmap
-
Scatter plot (latitude vs longitude with price)
-
R² Score
-
Mean Squared Error (MSE)
-
Root Mean Squared Error (RMSE)
-
Linear Regression applied
-
Random Forest gave better accuracy
-
GridSearchCV used for best parameter tuning
-
Final model selected based on performance
-
Location (latitude & longitude) strongly affects house prices
-
Median income has high impact on prediction
-
Engineered features improved model accuracy
-
Log transformation helped normalize skewed data
-
Model saved using Joblib
-
Scaler also saved
-
Integrated into a Streamlit Web App
-
Enter house details
-
Click Predict
-
Get estimated house price instantly
- pip install -r requirements.txt
- streamlit run app.py
-
Add more advanced models (XGBoost, Gradient Boosting)
-
Improve feature engineering
-
Add map-based visualization
-
Create better UI/UX

