The development of automatic segmentation techniques for medical imaging tasks requires assessment metrics to fairly judge and rank such approaches on benchmarks. The Dice Similarity Coefficient (DSC) is a popular choice for comparing the agreement between the predicted segmentation against a ground-truth mask. However, the DSC metric has been shown to be biased to the occurrence rate of the positive class in the ground-truth, and hence should be considered in combination with other metrics. This work describes a detailed analysis of the recently proposed normalised Dice Similarity Coefficient (nDSC) for binary segmentation tasks as an adaptation of DSC which scales the precision at a fixed recall rate to tackle this bias. White matter lesion segmentation on magnetic resonance images of multiple sclerosis patients is selected as a case study task to empirically assess the suitability of nDSC. We validate the normalised DSC using two different models across 59 subject scans with a wide range of lesion loads. It is found that the nDSC is less biased than DSC with lesion load on standard white matter lesion segmentation benchmarks measured using standard rank correlation coefficients. An implementation of nDSC is made available at: https://github.com/NataliiaMolch/nDSC .
翻译:医学影像自动分割技术的发展需要评估指标,以公正评判并排序基准测试中的各类方法。Dice相似系数(DSC)是衡量预测分割与真实标注之间一致性的常用指标。然而,已有研究表明DSC指标对真实标注中正类的出现频率存在偏差,因此应与其他指标结合使用。本研究针对近期提出的二值分割任务标准化Dice相似系数(nDSC)进行了详细分析。nDSC作为DSC的改进方法,通过在固定召回率下缩放精确度来应对该偏差。本研究选取多发性硬化患者磁共振图像中的白质病变分割作为典型案例,通过实验评估nDSC的适用性。我们采用两种不同模型,对59例病变负荷差异显著的受试者扫描数据验证了标准化DSC的有效性。研究发现,在标准白质病变分割基准测试中,使用标准秩相关系数衡量时,nDSC比DSC对病变负荷的偏差更小。nDSC的实现代码已公开于:https://github.com/NataliiaMolch/nDSC。