Motivated by the increasing popularity of transformers in computer vision, in recent times there has been a rapid development of novel architectures. While in-domain performance follows a constant, upward trend, properties like robustness or uncertainty estimation are less explored -leaving doubts about advances in model reliability. Studies along these axes exist, but they are mainly limited to classification models. In contrast, we carry out a study on semantic segmentation, a relevant task for many real-world applications where model reliability is paramount. We analyze a broad variety of models, spanning from older ResNet-based architectures to novel transformers and assess their reliability based on four metrics: robustness, calibration, misclassification detection and out-of-distribution (OOD) detection. We find that while recent models are significantly more robust, they are not overall more reliable in terms of uncertainty estimation. We further explore methods that can come to the rescue and show that improving calibration can also help with other uncertainty metrics such as misclassification or OOD detection. This is the first study on modern segmentation models focused on both robustness and uncertainty estimation and we hope it will help practitioners and researchers interested in this fundamental vision task. Code available at https://github.com/naver/relis.
翻译:受Transformer在计算机视觉领域日益流行的推动,近年来新型架构迅速发展。虽然域内性能持续上升,但鲁棒性或不确定性估计等属性研究较少——这使人们对模型可靠性的进展产生了疑虑。相关研究虽存在,但主要局限于分类模型。相比之下,我们针对语义分割这一对许多实际应用至关重要的任务展开研究——在这些应用中,模型可靠性至关重要。我们分析了从基于ResNet的旧架构到新型Transformer的多种模型,并基于四个指标评估其可靠性:鲁棒性、校准性、误分类检测和分布外(OOD)检测。我们发现,尽管最新模型显著更加鲁棒,但在不确定性估计方面整体并未更可靠。我们进一步探索了可改善这一状况的方法,并表明改进校准性也有助于提升其他不确定性指标(如误分类或OOD检测)。这是首个同时聚焦现代分割模型鲁棒性与不确定性估计的研究,我们希望它能帮助关注这一基础视觉任务的研究者与实践者。代码见https://github.com/naver/relis。