We study the problem of semantic segmentation calibration. For image classification, lots of existing solutions are proposed to alleviate model miscalibration of confidence. However, to date, confidence calibration research on semantic segmentation is still limited. We provide a systematic study on the calibration of semantic segmentation models and propose a simple yet effective approach. First, we find that model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration. Among them, prediction correctness, especially misprediction, is more important to miscalibration due to over-confidence. Next, we propose a simple, unifying, and effective approach, namely selective scaling, by separating correct/incorrect prediction for scaling and more focusing on misprediction logit smoothing. Then, we study popular existing calibration methods and compare them with selective scaling on semantic segmentation calibration. We conduct extensive experiments with a variety of benchmarks on both in-domain and domain-shift calibration, and show that selective scaling consistently outperforms other methods.
翻译:我们研究了语义分割校准问题。在图像分类中,已有大量解决方案用于缓解模型置信度的误校准。然而,迄今为止,针对语义分割的置信度校准研究仍十分有限。本文对语义分割模型的校准进行了系统性研究,并提出了一种简单而有效的方案。首先,我们发现模型容量、裁剪尺寸、多尺度测试以及预测正确性均对校准产生影响。其中,预测错误(尤其是误预测)由于过度自信,对误校准的影响更为显著。其次,我们提出了一种简单、统一且有效的方法——选择性缩放,通过分离正确/错误预测进行缩放,并更聚焦于误预测的对数平滑。随后,我们研究了现有流行的校准方法,并将其与选择性缩放进行了语义分割校准对比。我们在领域内和领域迁移校准的多个基准上进行了大量实验,结果表明选择性缩放始终优于其他方法。