You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal of this project is to provide a platform to build and train machine learning models for medical image
classification using Google Cloud's Vertex AI.
In this project I have used mammography(breast cancer) images, but I am pretty sure any type of
medical image datasets can be used by tweaking the model parameters here.
Dataset
The dataset was obtained from mendeley website here.
The dataset in the link above contains DDSM, INBREAST and a directory containing DDSM+INBREAST+MIAS
combined datasets. Only DDSM dataset was used in this project.
Preparing Train dataset
No augmentation was done since the dataset was already augmented. If it's required, augmentation procedure is available
here
Open notebook train_data_creator.ipynb located at tools/preprocessors and run it.
Input data folder should contain one folder per class containing images.
Give an empty directory as the destination path and run the notebook.
Once notebook is run completely, following folders/files will be created in destination directory.
all_images.zip: contains all images which are supposed to be used for training, in zipped format.
train: contains 2 npy files. one containing train image names and one containing labels.
val: contains 2 npy files. one containing validation image names and one containing labels.
The entire destination folder needs to be uploaded to Google Storage before training is started.
Training models locally
Install packages from requirements.txt
pip install -r requirements_baremetal.txt
Open config file at config/config.yaml and update it accordingly.
Go to project root and run following. It sets environment variable
PYTHONPATH to project root so that modules can be imported easily.
Go to google cloud console and create an instance of AI Notebooks.
If not known how to do that, follow the procedure given here.
(Create the notebook with low specifications, as we will not be running actual training here.
This just acts as a base machine to submit the job to Vertex AI.
The best choice is n1-standard-2 machines which have 7.5 gb memory and 2 vCpus).
Create a google storage bucket. If not known how to do that,
follow the procedure given here
Upload the training dataset folder which contains all images zip file along with 'train' and 'val'
folders containing npy files into newly created bucket.
open config file at config/config.yaml and update it accordingly. Make sure to mention full paths
starting from 'gs://' while specifying paths inside the bucket.
Classification results will be printed on the screen. If data path is a directory,
all images inside the directories, sub-directories will be read and will be run against model.
Results
DDSM dataset images were visually not distinguishable.
I was able to get the training accuracy of 99% within 3 epochs for both CNN and VGG19 models
owing to large number of images in the dataset.
Testing accuracy was 100% for both CNN and VGG19. They classified all test images accurately as
benign and cancerous.
About
To classify medical images into benign and cancerous using Google Cloud's Vertex Ai.