OncoPredict - AI-Based Breast Cancer Detection System

📌 Project Overview

This project is an end-to-end bioinformatics and machine learning system that analyzes biological tumor features to predict whether a tumor is benign or malignant. Using the widely recognized Breast Cancer Wisconsin Dataset, this project demonstrates how machine learning can be used in the medical field to aid diagnoses based on digitized imaging and biological data.

🎯 Objectives

Data Analysis: Perform comprehensive Exploratory Data Analysis (EDA) to understand the distribution and correlation of 30 continuous biological variables.
Dimensionality Reduction: Implemented PCA for visualizing high-dimensional biological data into interpretable 2D spaces.
Machine Learning: Train, evaluate, and compare Logistic Regression, Random Forest, and Support Vector Machines for medical diagnostics.
Model Deployment: Build a simple interactive dashboard using Streamlit, allowing clinicians/users to input patient features and receive a prediction.

🛠️ Tech Stack

Python (Data Analysis & Modeling)
Pandas & NumPy (Data Preprocessing)
Matplotlib & Seaborn (Data Visualization)
Scikit-Learn (Machine Learning & PCA)
Flask (Web Backend Framework)
HTML/CSS/JS (Glassmorphic Web Frontend)

🗂️ Project Structure

OncoPredict/
│
├── data_analysis.ipynb     # Jupyter Notebook for EDA & biological visualizations
├── model_training.py       # ML Pipeline script to train & evaluate models
├── prediction_system.py    # Prediction logic to accept features & output class/prob
├── app.py                  # Flask backend providing endpoints and UI
├── templates/
│   └── index.html          # HTML Web Application Interface
├── static/
│   ├── style.css           # Glassmorphic CSS Styling
│   └── script.js           # Interactive UI logic
├── generate_notebook.py    # Python script automating notebook creation
├── requirements.txt        # Project dependencies
└── README.md               # Project documentation

📊 Methodology & Results

Data Preprocessing: The dataset was normalized using StandardScaler to ensure all biological measurements are on the same magnitude scale.
Exploratory Visualizations: Used pairwise scatterplots and heatmaps to discover that mean radius, mean perimeter, and mean area show a high correlation and differentiate well between benign and malignant cases.
PCA: The first two principal components capture over 63% of the total variance, showing distinct clustering between the classes in 2D space.
Model Performance:
- Evaluated using Accuracy, Precision, Recall, F1-score, and ROC curve (AUC).
- Logistic Regression, Random Forest, and SVM models were trained using an 80/20 train-test split layout.
- Model metrics (F1-score) drove our best model selection. Random Forest proved to be highly interpretable thanks to the extracted feature importances.
Feature Importance: Random Forest analysis revealed that features like worst radius, worst perimeter, and worst area are critical indicators for predictions.

🚀 How to Run Locally

1. Install Dependencies

pip install -r requirements.txt

2. Run Exploratory Data Analysis

If the jupyter command is not recognized, you can run the notebook through Python directly:

python -m jupyter notebook data_analysis.ipynb

(Alternatively, just open the data_analysis.ipynb file in your VS Code which allows you to run cells natively!)

3. Model Training

Run model_training.py to train models, generate evaluation plots (roc_curve_comparison.png, feature_importance.png), and save artifacts (best_model.pkl, scaler.pkl).

python model_training.py

4. Start the Application

Boot up the Flask web backend.

python app.py

Open the URL shown in the terminal (usually http://127.0.0.1:5000/) in your browser to view the application!

The Sample Report link for input purpose is attachd below:

https://docs.google.com/document/d/1sJXyiBUusD7_LzGzv139xyZWyBF2y0Y-/edit?usp=sharing&ouid=112675565658180695174&rtpof=true&sd=true

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OncoPredict - AI-Based Breast Cancer Detection System

📌 Project Overview

🎯 Objectives

🛠️ Tech Stack

🗂️ Project Structure

📊 Methodology & Results

🚀 How to Run Locally

1. Install Dependencies

2. Run Exploratory Data Analysis

3. Model Training

4. Start the Application

The Sample Report link for input purpose is attachd below:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
best_model.pkl		best_model.pkl
data_analysis.ipynb		data_analysis.ipynb
feature_importance.png		feature_importance.png
feature_names.pkl		feature_names.pkl
generate_notebook.py		generate_notebook.py
model_training.py		model_training.py
prediction_system.py		prediction_system.py
requirements.txt		requirements.txt
roc_curve_comparison.png		roc_curve_comparison.png
scaler.pkl		scaler.pkl

Folders and files

Latest commit

History

Repository files navigation

OncoPredict - AI-Based Breast Cancer Detection System

📌 Project Overview

🎯 Objectives

🛠️ Tech Stack

🗂️ Project Structure

📊 Methodology & Results

🚀 How to Run Locally

1. Install Dependencies

2. Run Exploratory Data Analysis

3. Model Training

4. Start the Application

The Sample Report link for input purpose is attachd below:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages