In the past few years, in the context of fully-supervised semantic segmentation, several losses -- such as cross-entropy and dice -- have emerged as de facto standards to supervise neural networks. The Dice loss is an interesting case, as it comes from the relaxation of the popular Dice coefficient; one of the main evaluation metric in medical imaging applications. In this paper, we first study theoretically the gradient of the dice loss, showing that concretely it is a weighted negative of the ground truth, with a very small dynamic range. This enables us, in the second part of this paper, to mimic the supervision of the dice loss, through a simple element-wise multiplication of the network output with a negative of the ground truth. This rather surprising result sheds light on the practical supervision performed by the dice loss during gradient descent. This can help the practitioner to understand and interpret results while guiding researchers when designing new losses.
翻译:在过去的几年中,在全监督语义分割的背景下,交叉熵损失和骰子损失等几种损失函数已成为监督神经网络的公认标准。骰子损失是一个有趣的情况,因为它源自流行的骰子系数的松弛形式——这是医学成像应用中的主要评估指标之一。本文首先从理论上研究了骰子损失的梯度,表明其具体表现为真实标签的加权负值,且动态范围非常小。这使我们能够在论文的第二部分,通过简单地将网络输出与真实标签的负值进行逐元素相乘,来模仿骰子损失的监督效果。这一相当令人惊讶的结果揭示了在梯度下降过程中,骰子损失实际执行的监督机制。这有助于从业者理解和解释实验结果,同时为研究人员设计新的损失函数提供指导。