Deep neural networks for medical image segmentation often produce overconfident results misaligned with empirical observations. Such miscalibration, challenges their clinical translation. We propose to use marginal L1 average calibration error (mL1-ACE) as a novel auxiliary loss function to improve pixel-wise calibration without compromising segmentation quality. We show that this loss, despite using hard binning, is directly differentiable, bypassing the need for approximate but differentiable surrogate or soft binning approaches. Our work also introduces the concept of dataset reliability histograms which generalises standard reliability diagrams for refined visual assessment of calibration in semantic segmentation aggregated at the dataset level. Using mL1-ACE, we reduce average and maximum calibration error by 45% and 55% respectively, maintaining a Dice score of 87% on the BraTS 2021 dataset. We share our code here: https://github.com/cai4cai/ACE-DLIRIS
翻译:深度学习模型在医学图像分割中常产生与经验观察不符的过度自信结果,这种校准偏差阻碍了模型的临床转化。本文提出将边际L1平均校准误差作为新型辅助损失函数,在保持分割质量的前提下提升逐像素校准效果。我们证明:尽管采用硬分箱策略,该损失函数仍具有直接可微分性,无需借助近似可微分的替代方案或软分箱技术。本文同时引入数据集级可靠性直方图概念,它将标准可靠性图推广至语义分割的精细化视觉评估场景。在BraTS 2021数据集上应用mL1-ACE后,平均校准误差降低45%,最大校准误差降低55%,同时Dice评分维持在87%。代码开源地址:https://github.com/cai4cai/ACE-DLIRIS