Segmentation has emerged as a fundamental field of computer vision and natural language processing, which assigns a label to every pixel/feature to extract regions of interest from an image/text. To evaluate the performance of segmentation, the Dice and IoU metrics are used to measure the degree of overlap between the ground truth and the predicted segmentation. In this paper, we establish a theoretical foundation of segmentation with respect to the Dice/IoU metrics, including the Bayes rule and Dice-/IoU-calibration, analogous to classification-calibration or Fisher consistency in classification. We prove that the existing thresholding-based framework with most operating losses are not consistent with respect to the Dice/IoU metrics, and thus may lead to a suboptimal solution. To address this pitfall, we propose a novel consistent ranking-based framework, namely RankDice/RankIoU, inspired by plug-in rules of the Bayes segmentation rule. Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in large-scale and high-dimensional segmentation. We study statistical properties of the proposed framework. We show it is Dice-/IoU-calibrated, and its excess risk bounds and the rate of convergence are also provided. The numerical effectiveness of RankDice/mRankDice is demonstrated in various simulated examples and Fine-annotated CityScapes, Pascal VOC and Kvasir-SEG datasets with state-of-the-art deep learning architectures.
翻译:分割已成为计算机视觉和自然语言处理的基础领域,它为每个像素/特征分配标签,以从图像/文本中提取感兴趣区域。为评估分割性能,通常采用Dice和IoU指标来衡量真实标注与预测分割之间的重叠程度。本文针对Dice/IoU指标建立了分割的理论基础,包括贝叶斯规则和Dice-/IoU-校准,类似于分类中的分类校准或Fisher一致性。我们证明,现有的基于阈值的框架在多数操作损失下与Dice/IoU指标不一致,因此可能导致次优结果。为解决此缺陷,受贝叶斯分割规则的插件规则启发,我们提出了一种新型的基于排序的一致性框架,即RankDice/RankIoU。我们开发了三种支持GPU并行执行的数值算法,用于在大规模高维分割中实现所提框架。我们研究了所提框架的统计性质,证明其具有Dice-/IoU-校准性,并给出了其超额风险界及收敛速率。通过在多种模拟示例及基于先进深度学习架构的精标注CityScapes、Pascal VOC和Kvasir-SEG数据集上的实验,验证了RankDice/mRankDice的数值有效性。