GitHub - Subrahmanyajoshi/Cancer-Detection-using-GCP: To classify medical images into benign and cancerous using Google Cloud's Vertex Ai.

Overview

The goal of this project is to provide a platform to build and train machine learning models for medical image classification using Google Cloud's Vertex AI.
In this project I have used mammography(breast cancer) images, but I am pretty sure any type of medical image datasets can be used by tweaking the model parameters here.

Dataset

The dataset was obtained from mendeley website here.
The dataset in the link above contains DDSM, INBREAST and a directory containing DDSM+INBREAST+MIAS combined datasets. Only DDSM dataset was used in this project.

Preparing Train dataset

No augmentation was done since the dataset was already augmented. If it's required, augmentation procedure is available here
Open notebook train_data_creator.ipynb located at tools/preprocessors and run it.
Input data folder should contain one folder per class containing images.
Give an empty directory as the destination path and run the notebook.
Once notebook is run completely, following folders/files will be created in destination directory.
- all_images.zip: contains all images which are supposed to be used for training, in zipped format.
- train: contains 2 npy files. one containing train image names and one containing labels.
- val: contains 2 npy files. one containing validation image names and one containing labels.
The entire destination folder needs to be uploaded to Google Storage before training is started.

Training models locally

Install packages from requirements.txt

pip install -r requirements_baremetal.txt

Open config file at config/config.yaml and update it accordingly.
Go to project root and run following. It sets environment variable PYTHONPATH to project root so that modules can be imported easily.

export PYTHONPATH=$(pwd):${PYTHONPATH}

Run trainer

python3 -m detectors.detector --train --config='./config/config.yaml'

Submitting Training job to Vertex AI

Go to google cloud console and create an instance of AI Notebooks. If not known how to do that, follow the procedure given here. (Create the notebook with low specifications, as we will not be running actual training here. This just acts as a base machine to submit the job to Vertex AI. The best choice is n1-standard-2 machines which have 7.5 gb memory and 2 vCpus).
Launch Jupyter lab and open a terminal.
Clone this repository.

git clone https://github.com/Subrahmanyajoshi/Breast-Cancer-Detection.git

Create a google storage bucket. If not known how to do that, follow the procedure given here
Upload the training dataset folder which contains all images zip file along with 'train' and 'val' folders containing npy files into newly created bucket.
open config file at config/config.yaml and update it accordingly. Make sure to mention full paths starting from 'gs://' while specifying paths inside the bucket.
Open the detectors/vertex_ai_job_submitter shell script.
Change the environment variables if required.
Run the shell script

./detectors/vertex_ai_job_submitter.sh

The shell script will create a custom training job and submit it to Vertex AI.

Activating and using tensorboard to monitor training

Tensorboard will be running on port 6006 by default.
A firewall rule must be set up to open this port, follow the procedure given here.
Once done, open a terminal and run following. Provide the path to tensorboard directory, specified in config file.

tensorboard --logdir <path/to/log/directory> --bind_all

Get the external Ip of the VM on which notebook is running, from 'VM Instances' page on google cloud console.
Open a browser and open following link .

http://<external_ip_address>:6006/

Predicting

Open config file at config/config.yaml and update model path, and data path at the very bottom.
Install opencv-python-headless package

pip3 install opencv-python-headless==4.5.3.56

Go to project root and run following. It sets environment variable PYTHONPATH to project root so that modules can be imported easily.

export PYTHONPATH=$(pwd):${PYTHONPATH}

Run predictor

python3 -m detectors.detector --predict --config='./config/config.yaml'

Classification results will be printed on the screen. If data path is a directory, all images inside the directories, sub-directories will be read and will be run against model.

Results

DDSM dataset images were visually not distinguishable.
I was able to get the training accuracy of 99% within 3 epochs for both CNN and VGG19 models owing to large number of images in the dataset.
Testing accuracy was 100% for both CNN and VGG19. They classified all test images accurately as benign and cancerous.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
config		config
detectors		detectors
experiments		experiments
tools		tools
.gitignore		.gitignore
README.md		README.md
requirements_baremetal.txt		requirements_baremetal.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Dataset

Preparing Train dataset

Training models locally

Submitting Training job to Vertex AI

Activating and using tensorboard to monitor training

Predicting

Results

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Dataset

Preparing Train dataset

Training models locally

Submitting Training job to Vertex AI

Activating and using tensorboard to monitor training

Predicting

Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages