Segmentation has emerged as a fundamental field of computer vision and natural language processing, which assigns a label to every pixel/feature to extract regions of interest from an image/text. To evaluate the performance of segmentation, the Dice and IoU metrics are used to measure the degree of overlap between the ground truth and the predicted segmentation. In this paper, we establish a theoretical foundation of segmentation with respect to the Dice/IoU metrics, including the Bayes rule and Dice-/IoU-calibration, analogous to classification-calibration or Fisher consistency in classification. We prove that the existing thresholding-based framework with most operating losses are not consistent with respect to the Dice/IoU metrics, and thus may lead to a suboptimal solution. To address this pitfall, we propose a novel consistent ranking-based framework, namely RankDice/RankIoU, inspired by plug-in rules of the Bayes segmentation rule. Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in large-scale and high-dimensional segmentation. We study statistical properties of the proposed framework. We show it is Dice-/IoU-calibrated, and its excess risk bounds and the rate of convergence are also provided. The numerical effectiveness of RankDice/mRankDice is demonstrated in various simulated examples and Fine-annotated CityScapes, Pascal VOC and Kvasir-SEG datasets with state-of-the-art deep learning architectures.
翻译:分割已成为计算机视觉与自然语言处理的基础领域,其任务是为每个像素/特征分配标签,从而从图像/文本中提取感兴趣区域。为评估分割性能,通常采用Dice系数和IoU指标衡量真实标注与预测分割结果的重叠程度。本文建立了关于Dice/IoU指标的分割理论框架,包括贝叶斯规则与Dice-/IoU-校准性,类比于分类中的分类校准性或Fisher一致性。我们证明现有基于阈值的框架在大多数操作损失下无法满足Dice/IoU指标的一致性,可能导致次优解。为解决该缺陷,受贝叶斯分割规则中的插入法启发,提出一种新颖的基于排序的一致性框架——RankDice/RankIoU。开发了三种支持GPU并行执行的数值算法,用于在大规模高维分割中实现所提框架。我们研究该框架的统计性质,证明其满足Dice-/IoU-校准,并给出超额风险界及收敛速率。基于最先进深度学习架构的数值实验在多种模拟示例及精细标注的CityScapes、Pascal VOC和Kvasir-SEG数据集上验证了RankDice/mRankDice的有效性。