This project focuses on classifying bone conditions using deep learning models trained on X-ray images. The models used include VGG16, VGG19, InceptionV3, ResNet50, Xception, AlexNet, MobileNetV2 and a Custom CNN. The goal is to accurately classify images into three categories:
- Osteopenia
- Osteoporosis
- Normal
The dataset consists of X-ray images of bones, divided into three classes. The images were preprocessed by resizing, normalizing, and augmenting to enhance the model's performance.
| X-ray Images | Classification | |||
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Normal |
![]() |
![]() |
![]() |
![]() |
Osteopenia |
![]() |
![]() |
![]() |
![]() |
Osteoporosis |
We have trained and evaluated the following deep learning models:
- VGG16
- VGG19
- InceptionV3
- ResNet50
- Xception
- AlexNet
- Custom CNN
- Late Fusion
- Dense Net 121
- VGG 16 + VGG 19
- InceptionV3 + XceptionNet
- ResNet 50 + DenseNet 121
Each model was trained with the same dataset and evaluated using precision, recall, f1-score, accuracy, and confusion matrices.
Below is a summary of the classification performance for each model:
precision recall f1-score support
Osteopenia 0.82 0.55 0.66 75
Osteoporosis 0.60 0.86 0.71 159
Normal 0.83 0.60 0.70 156
accuracy 0.70 390
macro avg 0.75 0.67 0.69 390
weighted avg 0.74 0.70 0.70 390 precision recall f1-score support
Osteopenia 0.81 0.73 0.77 75
Osteoporosis 0.68 0.88 0.77 159
Normal 0.85 0.63 0.73 156
accuracy 0.75 390
macro avg 0.78 0.75 0.75 390
weighted avg 0.77 0.75 0.75 390 precision recall f1-score support
Osteopenia 0.85 0.88 0.86 75
Osteoporosis 0.88 0.91 0.89 159
Normal 0.91 0.87 0.89 156
accuracy 0.88 390
macro avg 0.88 0.88 0.88 390
weighted avg 0.89 0.88 0.88 390 precision recall f1-score support
Osteopenia 0.87 0.35 0.50 75
Osteoporosis 0.57 0.92 0.70 159
Normal 0.86 0.55 0.67 156
accuracy 0.66 390
macro avg 0.76 0.61 0.62 390
weighted avg 0.74 0.66 0.65 390 precision recall f1-score support
Osteopenia 0.81 0.89 0.85 75
Osteoporosis 0.89 0.83 0.86 159
Normal 0.89 0.91 0.90 156
accuracy 0.87 390
macro avg 0.86 0.88 0.87 390
weighted avg 0.88 0.87 0.87 390 precision recall f1-score support
Osteopenia 0.86 0.85 0.86 75
Osteoporosis 0.82 0.88 0.85 159
Normal 0.89 0.83 0.86 156
accuracy 0.85 390
macro avg 0.86 0.85 0.85 390
weighted avg 0.86 0.85 0.85 390 precision recall f1-score support
Osteopenia 0.75 0.84 0.79 75
Osteoporosis 0.85 0.85 0.85 159
Normal 0.93 0.88 0.90 156
accuracy 0.86 390
macro avg 0.84 0.86 0.85 390
weighted avg 0.86 0.86 0.86 390 precision recall f1-score support
Osteopenia 0.82 0.87 0.84 75
Osteoporosis 0.79 0.87 0.83 159
Normal 0.88 0.77 0.82 156
accuracy 0.83 390
macro avg 0.83 0.83 0.83 390
weighted avg 0.83 0.83 0.83 390 precision recall f1-score support
Osteopenia 0.82 0.75 0.78 75
Osteoporosis 0.87 0.92 0.90 159
Normal 0.94 0.92 0.93 156
accuracy 0.89 390
macro avg 0.88 0.86 0.87 390
weighted avg 0.89 0.89 0.89 390 precision recall f1-score support
Osteopenia 0.73 0.72 0.72 75
Osteoporosis 0.75 0.83 0.79 159
Normal 0.86 0.78 0.82 156
accuracy 0.79 390
macro avg 0.78 0.78 0.78 390
weighted avg 0.79 0.79 0.79 390 precision recall f1-score support
Osteopenia 0.78 0.83 0.80 75
Osteoporosis 0.75 0.86 0.80 159
Normal 0.89 0.73 0.80 156
accuracy 0.80 390
macro avg 0.81 0.81 0.80 390
weighted avg 0.81 0.80 0.80 390 precision recall f1-score support
Osteopenia 0.78 0.77 0.78 75
Osteoporosis 0.83 0.87 0.85 159
Normal 0.89 0.85 0.87 156
accuracy 0.84 390
macro avg 0.83 0.83 0.83 390
weighted avg 0.84 0.84 0.84 390 precision recall f1-score support
Osteopenia 0.83 0.57 0.68 75
Osteoporosis 0.79 0.92 0.85 159
Normal 0.85 0.84 0.85 156
accuracy 0.82 390
macro avg 0.82 0.78 0.79 390
weighted avg 0.82 0.82 0.82 390 precision recall f1-score support
Osteopenia 0.86 0.81 0.84 75
Osteoporosis 0.85 0.94 0.89 159
Normal 0.94 0.85 0.89 156
accuracy 0.88 390
macro avg 0.88 0.87 0.87 390
weighted avg 0.89 0.88 0.88 390 precision recall f1-score support
Osteopenia 0.75 0.85 0.80 75
Osteoporosis 0.84 0.86 0.85 159
Normal 0.93 0.84 0.88 156
accuracy 0.85 390
macro avg 0.84 0.85 0.84 390
weighted avg 0.86 0.85 0.85 390 precision recall f1-score support
Osteopenia 0.80 0.88 0.84 75
Osteoporosis 0.81 0.88 0.84 159
Normal 0.93 0.79 0.86 156
accuracy 0.85 390
macro avg 0.84 0.85 0.84 390
weighted avg 0.85 0.85 0.85 390 precision recall f1-score support
Osteopenia 0.76 0.85 0.81 75
Osteoporosis 0.79 0.88 0.83 159
Normal 0.95 0.78 0.86 156
accuracy 0.84 390
macro avg 0.83 0.84 0.83 390
weighted avg 0.85 0.84 0.84 390 precision recall f1-score support
Osteopenia 0.72 0.80 0.76 75
Osteoporosis 0.82 0.86 0.84 159
Normal 0.94 0.85 0.89 156
accuracy 0.84 390
macro avg 0.83 0.83 0.83 390
weighted avg 0.85 0.84 0.84 390 precision recall f1-score support
Osteopenia 0.83 0.76 0.79 75
Osteoporosis 0.82 0.91 0.86 159
Normal 0.90 0.83 0.87 156
accuracy 0.85 390
macro avg 0.85 0.84 0.84 390
weighted avg 0.85 0.85 0.85 390Each model has an associated confusion matrix and performance graphs showcasing:
- Training & Validation Accuracy
- Training & Validation Loss
- Comparative Model Performance
Grad-CAM (Gradient-weighted Class Activation Mapping) is a powerful visualization technique used to understand which regions of an input image a Convolutional Neural Network (CNN) focuses on when making predictions.
Grad-CAM uses the gradients of any target class flowing into the final convolutional layer to generate a heatmap that highlights the important regions in the image for prediction. It helps in making deep learning models more interpretable, especially in tasks like image classification.
The heatmap is overlaid on the original image using a colormap (usually Jet), where each color indicates a different level of importance:
| Color | Importance Level | Description |
|---|---|---|
| π΅ Blue | Low | Regions the model considers less important. |
| π’ Green | Medium-Low | Moderately important areas, not critical. |
| π‘ Yellow | Medium-High | Areas contributing more to the prediction. |
| π΄ Red | High | Most influential regions driving the prediction. |
| πΈ Purple | Very Low | Negligible impact; mostly ignored by the model. |
| π Orange / Pink | Medium | Contributing regions, not the most critical but relevant. |
π₯ Red and Yellow regions show where the model is "looking" the most while making its decision.
- β Helps validate model behavior
- β Identifies biased or incorrect attention
- β Useful for debugging and model improvement
- β Enhances explainability for sensitive applications (e.g., medical imaging)
- Image classification (e.g., "Is this a cough or not?")
- Object detection
- Medical diagnosis interpretation (e.g., X-ray analysis)
| Grad-CAM Heatmaps | ||
|---|---|---|
![]() |
![]() |
![]() |
| VGG 16 | VGG 19 | InceptionV3 |
![]() |
![]() |
![]() |
| XceptionNet | ResNet50 | DenseNet121 |
![]() |
![]() |
|
| Late Fusion | Custom CNN |
Among all models, Custom CNN performed the best with 89% accuracy, followed by InceptionV3 at 88%. The VGG and ResNet architectures showed moderate performance. The confusion matrices and graphs provide further insights into model performance.
This project uses Streamlit to create a web interface for osteoporosis detection.
Open terminal in your system and clone the repository.
git clone https://github.com/dpavansekhar/Osteoporosis-Detection-using-Machine-Learning.git
cd Osteoporosis-Detection-using-Machine-Learningpip install -r requirements.txtOpen the command prompt in the same folder i.e, where the project is present.
streamlit run interface/app.pyThis will launch the app in your default browser at https://localhost:8501.
Dogga Pavan Sekhar - AI/ML Researcher
This project was developed as part of an ongoing research initiative in medical image classification using deep learning.

















































