Despite their impressive predictive performance in various computer vision tasks, deep neural networks (DNNs) tend to make overly confident predictions, which hinders their widespread use in safety-critical applications. While there have been recent attempts to calibrate DNNs, most of these efforts have primarily been focused on classification tasks, thus neglecting DNN-based object detectors. Although several recent works addressed calibration for object detection and proposed differentiable penalties, none of them are consistent estimators of established concepts in calibration. In this work, we tackle the challenge of defining and estimating calibration error specifically for this task. In particular, we adapt the definition of classification calibration error to handle the nuances associated with object detection, and predictions in structured output spaces more generally. Furthermore, we propose a consistent and differentiable estimator of the detection calibration error, utilizing kernel density estimation. Our experiments demonstrate the effectiveness of our estimator against competing train-time and post-hoc calibration methods, while maintaining similar detection performance.
翻译:尽管深度神经网络在多种计算机视觉任务中展现出令人印象深刻的预测性能,但其倾向于做出过度自信的预测,这阻碍了其在安全关键型应用中的广泛使用。虽然近期已有针对深度神经网络校准的研究尝试,但这些工作大多聚焦于分类任务,从而忽略了基于深度神经网络的目标检测器。尽管已有若干近期工作涉及目标检测的校准问题并提出了可微的惩罚项,但其中没有任何方法能够对校准领域中的既定概念实现一致估计。在本研究中,我们专门针对该任务面临的校准误差定义与估计这一挑战展开攻关。具体而言,我们调整了分类校准误差的定义,以适应目标检测的相关细微差别,并更广泛地适用于结构化输出空间中的预测。此外,我们利用核密度估计,提出了一种针对检测校准误差的一致且可微的估计量。实验结果表明,与训练时校准和事后校准的竞争方法相比,我们的估计量在保持相似检测性能的同时具有更优效果。