We study the problem of semantic segmentation calibration. Lots of solutions have been proposed to approach model miscalibration of confidence in image classification. However, to date, confidence calibration research on semantic segmentation is still limited. We provide a systematic study on the calibration of semantic segmentation models and propose a simple yet effective approach. First, we find that model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration. Among them, prediction correctness, especially misprediction, is more important to miscalibration due to over-confidence. Next, we propose a simple, unifying, and effective approach, namely selective scaling, by separating correct/incorrect prediction for scaling and more focusing on misprediction logit smoothing. Then, we study popular existing calibration methods and compare them with selective scaling on semantic segmentation calibration. We conduct extensive experiments with a variety of benchmarks on both in-domain and domain-shift calibration, and show that selective scaling consistently outperforms other methods.
翻译:我们研究了语义分割校准问题。在图像分类领域,已有大量解决方案用于处理模型置信度的误校准问题。然而,目前针对语义分割的置信度校准研究仍然有限。我们系统性地研究了语义分割模型的校准问题,并提出了一种简单而有效的方法。首先,我们发现模型容量、裁剪尺寸、多尺度测试以及预测正确性对校准结果有影响。其中,预测正确性(尤其错误预测)由于过度置信而对误校准的影响更为显著。其次,我们提出了一种简单、统一且有效的方法——选择性缩放,通过分离正确/错误预测进行缩放,并更专注于对错误预测的logit平滑处理。随后,我们研究了现有的主流校准方法,并将其与选择性缩放方法在语义分割校准任务上进行了对比。我们在多种基准数据集上进行了大量实验,涵盖领域内与领域偏移校准场景,结果表明选择性缩放方法始终优于其他方法。