Supervised and unsupervised machine learning models classifying vehicle silhouettes from geometric features, using Python (scikit-learn, Pandas, NumPy).
- Project Description
- Supervised Learning Notebook
- Unsupervised Learning Notebook
- Vehicle Silhouettes Dataset
- Cleaned Dataset
- Python 3.12+
- uv package manager
git clone https://github.com/krauseannelize/ml-vehicle-silhouettes.git
cd ml-vehicle-silhouettesuv syncuv run jupyter lab📌 Note: uv run automatically uses the project's virtual environment, no manual activation needed
Prospect Auto, a chain of car repair shops, requested models to classify vehicles based on silhouette features. This repo demonstrates two approaches:
- Supervised classification → Predicting vehicle classes using labeled data.
- Unsupervised clustering → Grouping vehicles without labels to discover natural structure.
- Explore the Vehicle Silhouettes dataset.
- Build supervised models to classify vehicles.
- Apply unsupervised clustering methods to group vehicles.
- Compare results and assess which approach is most effective.
-
Data Preparation
- Load and clean dataset.
- Normalize and standardize features.
- Split into training/testing subsets.
-
Supervised Learning
- Train classification models.
- Evaluate with accuracy, precision, recall, F1‑score.
-
Unsupervised Learning
- Apply PCA for dimensionality reduction.
- Train clustering models (e.g., k‑means).
- Evaluate with silhouette score and inertia.
- All supervised algorithms performed consistently well across classes, confirming strong generalization without significant overfitting.
- Logistic Regression emerged as the most balanced and reliable supervised model, achieving 95.1% accuracy on training data and 93.9% on test data.
- Unsupervised clustering (K-Means, Hierarchical, DBSCAN) revealed one dominant group with smaller overlapping sub-clusters that were imbalanced and did not align with true vehicle classes.
- While clustering is not a viable alternative for Prospect Auto's classification needs, it may still provide value for exploratory analysis or anomaly detection.