Advanced-Driver-Assistance-System

Semantic Segmentation

Semantic segmentation is the process of classifying each pixel in an image belonging to the class it belongs to. It differs from Instance Segmentation as it does not count on instances of the same class occurring in a given image. It outputs the same label to all the pixels of the same class objects.

It's been extensively used in self-driving cars. For a car to move independently, it should understand its surroundings completely at the pixel level. In this project, I have used semantic segmentation to identify the movement of other vehicles and pedestrians in the given video frame or image. The most common classes could be pedestrians, cars, trucks, trees (vegetation), sky, buildings and the list goes on.

The dataset used here is popular and one of the first released datasets for semantic segmentation tasks with labels. The images are extracted from the video sequence recorded for 10 minutes. The camera was set up on the dashboard of the car, with a view close to that of the driver.

The dataset can be downloaded from the Kaggle website https://www.kaggle.com/carlolepelaars/camvid.

Dataset Info

The dataset is divided as:

Training data pairs - 367

Validation data pairs - 101

Test data pairs - 223

The same split ratio for the CamVid semantic segmentation was used in many research papers.

Steps Involved

Data-preprocessing
Data-loading (using custom dataloaders)
Developing the deep learning architecture
Training the model
Evaluating the model:

i. Pixel Accuracy

ii. Intersection over Union
Testing for a particular sample image

U-Net Architecture

The advantage of U-Net architecture is it was built on fully convolutionaö network. It was primarily developed for semantic segmentation of medical tumours in lungs. The network is made of 2 sub parts. The first subpart is called encoder whose main task is to down-sample the input image to a feature map. The feature map is last layer is also called as latent space. From latent space the decoder network upsamples the image to the full size of the input image. The number of channels in the output image equals to number of classes we are classifying the image into. In this particular task, the total number of classes are 32. The spatial dimensions of the image are 256 in height and 256 in width.

Training the model

The configurations used for model training are as follows:

Batch Size = 8

Epochs = 100

Metrics - Intersection Over Union, accuracy

Loss - categorical crossentropy

Loss

A video named output_prediction1 is made with the images converted as video frames and uploaded in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
Semantic Segmentation of CamVId Dataset.ipynb		Semantic Segmentation of CamVId Dataset.ipynb
Sentiment_Analysis_v1.ipynb		Sentiment_Analysis_v1.ipynb
output_predictions1.avi		output_predictions1.avi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced-Driver-Assistance-System

Semantic Segmentation

Dataset Info

Steps Involved

U-Net Architecture

Training the model

Loss

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Advanced-Driver-Assistance-System

Semantic Segmentation

Dataset Info

Steps Involved

U-Net Architecture

Training the model

Loss

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages