In the GCT segmentation task, I see that the validation metrics include the left and right models, more specifically l-metric-mIOU and r-metric-mIOU. Which model's metric should I use to compare with other baselines such as self-training, self-training++?