The confidence calibration of deep learning-based perception models plays a crucial role in their reliability. Especially in the context of autonomous driving, downstream tasks like prediction and planning depend on accurate confidence estimates. In point-wise multiclass classification tasks like sematic segmentation the model has to deal with heavy class imbalances. Due to their underrepresentation, the confidence calibration of classes with smaller instances is challenging but essential, not only for safety reasons. We propose a metric to measure the confidence calibration quality of a semantic segmentation model with respect to individual classes. It is calculated by computing sparsification curves for each class based on the uncertainty estimates. We use the classification calibration metric to evaluate uncertainty estimation methods with respect to their confidence calibration of underrepresented classes. We furthermore suggest a double use for the method to automatically find label problems to improve the quality of hand- or auto-annotated datasets.
翻译:深度学习感知模型的置信度校准对其可靠性至关重要。尤其在自动驾驶场景中,预测与规划等下游任务依赖于准确的置信度估计。在语义分割这类逐点多分类任务中,模型需要应对严重的类别不平衡问题。由于小尺寸样本的欠代表性,针对这些类别的置信度校准既具挑战性又不可或缺——这不仅出于安全考量。我们提出一种衡量语义分割模型逐类别置信度校准质量的指标,该指标通过基于不确定性估计为每个类别计算稀疏化曲线而得到。我们运用分类校准指标评估不确定性估计方法在欠代表性类别上的置信度校准表现。此外,我们建议将该方法双重应用于自动识别标注问题,从而提升人工或自动标注数据集的质量。