In our presented solution (description below), we outperform the state of the arts performance from FADA in terms of Cityscapes to Cross-City domain adaptation (real-to-real) for all but one city (Tokyo). The following table gives the performance we have accomplished in our conducted experiments:

The following exmamples visualise our performance -- predicted segmentation maps:
| Taipei - raw image | Taipei - prediction |
|---|---|
![]() |
![]() |
| Tokyo - raw image | Tokyo - prediction |
|---|---|
![]() |
![]() |
| Rome - raw image | Rome - prediction |
|---|---|
![]() |
![]() |
| Rio - raw image | Rio - prediction |
|---|---|
![]() |
![]() |
- You have to install Docker service from the official website : https://docs.docker.com/engine/install/ubuntu/,
- To be consistent with NVida drivers, I highly recommend to install nvidia-docker2 as well by the following command
sudo apt install nvidia-docker2and follow the official guideline: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker, - After the installation both Docker services, restart the Docker service if running:
sudo systemctl restart dockeror reboot your machine.
The whole project -- Domain Adaptation for Semantic Segmentation, is included in the Docker container with corresponding libraries/tools. Moreover, a web application was implented to easily manage the project (training process).
- The root directory contains Dockerfile that defines all the dependecies needed to run the project,
- use the command line and move to the root directory of the project with mentioned Dockerfile,
- to build the Docker image, use the following command:
docker build -t itri:v0.1a .(do NOT forget type a dot as a last character). This commaind will prepare the runnable contrainer with specific tagitri:v0.1a(name:version). Please, keep this name so we could recognise the project version in the future, - once you build the image, you should setup the network bridge to be able run prepared web application. The network is created by this command:
docker network create --subnet=172.18.0.0/16 webapp_netso the container will use this local subnet address. Please, do NOT change this address if not neccesarry. The web application is listening on the particlular static IP address from this range.
- To run the Docker container correctly, you have to define two paths:
- a path to the datasets directory on your local machine,
- a path to an output directory for the experiments so the all files/models/logs will be stored directly to the local machine.
- Use the following command:
sudo nvidia-docker run --ipc=host --network webapp_net --ip 172.18.0.22 --gpus=all -v <path_to_dataset_dir>:/workspace/datasets/ -v <path_to_output_dir>:/workspace/experiments/ -d itri:v0.1a,- for example:
<path_to_dataset_dir>=/media/hdd/datasets/-- it depends on a location on your local machine! - The web application (implemented in NodeJS) will be running automatically when the docker container starts,
- for example:
- You can access web application via http://172.18.0.22:46351/,
- if you have changed the static IP address for the Docker container, you also have to change HOST address in
server.jsfile :
'use strict';
//Constants ----- you can modify the host and port constants!
const PORT = 46351;
const HOST = '172.18.0.22';
....The main objective of domain adaptation is to generalise a segmentations model (predictor), by leveraging a set of source labelled data (source domain) and target unlabelled data (target domain), in such a way that a domain shift (gap) is minimised. In the other words, we have the source domain with available ground truths (labels) but missing ground truths for the target domain. Thus, we are not able to perform supervised learning for our desired target domain.
The crucial assumption is that the distrubtions of both domains are as close as possbile. The both domains share the same number of K classes (categories). If target domain contains the class that is not included in the source domain, we will not be able to predict this particular class. We want to make use of the features extracted from the source domain and transform them into the target domain context. The domain shift can be meant as the significant difference in the test error rate between the source and the target domains for the same context (categories). Consequently, the model should achieve high confidence in prediction in terms of segmentation map (accurate pixels predictions).
We have collected several available datasets in terms of image semantic segmentation task for the self-driving car environment. We implemented data loaders, compatible labels across datasets and trained the models for the following datasets:
MapillaryCityscapesGTA5CARLA
These datasets are publicly available for the academic purposes but the linceses forbid us a sharing them without permission of the owners. To get these datasets, you can follow the mentioned links above and register yourself to get the download link from the official websites.
The target domain datasets are generally missing the labels for training set, but offers labels for validation set so we are able to evaluate a performance of our trained models.
The following datasets were used as the target domain:
Cross-City (NTHU)- https://yihsinchen.github.io/segmentation_adaptation_dataset/
- not provide labels for the training set.
Cityscapes- the same link as above,
- used for
GTA5 to Cityscapesexperiments.
We directly start with adversarial training, where we train the segmentation model on the source domain (in supervised way) and subsequently on target domain using a discriminator model.
Once the adversarial learning achieve high performance, we generate pseudo labels for the target domain using entropy-based approach. Additionally, the pseudo labels are divided into two splits -- easy and hard. Easy split contains high quality pseudo labels; otherwise, hard split includes low confident pseudo labels.
In the last stage, we perform self-training (supervised learning) on the target domain with generated pseudo labels from easy split list. This allows the model to adapt the feaures in the cobtext of the target domain.
We take advantage of FADA fine-grained (class-wise) discriminator concept for adversarial learning stage.
For the segmentation model, we fully exploit a new DeepLabV3+ framework with ResNet-100 as a backobne. We made several changes to correctly get low and high features from the model.









