In clinical applications, the utility of segmentation models is often based on the accuracy of derived downstream metrics such as organ size, rather than by the pixel-level accuracy of the segmentation masks themselves. Thus, uncertainty quantification for such metrics is crucial for decision-making. Conformal prediction (CP) is a popular framework to derive such principled uncertainty guarantees, but applying CP naively to the final scalar metric is inefficient because it treats the complex, non-linear segmentation-to-metric pipeline as a black box. We introduce COMPASS, a practical framework that generates efficient, metric-based CP intervals for image segmentation models by leveraging the inductive biases of their underlying deep neural networks. COMPASS performs calibration directly in the model's representation space by perturbing intermediate features along low-dimensional subspaces maximally sensitive to the target metric. We prove that COMPASS achieves valid marginal coverage under the assumption of exchangeability. Empirically, we demonstrate that COMPASS produces significantly tighter intervals than traditional CP baselines on four medical image segmentation tasks for area estimation of skin lesions and anatomical structures. Furthermore, we show that leveraging learned internal features to estimate importance weights allows COMPASS to also recover target coverage under covariate shifts. COMPASS paves the way for practical, metric-based uncertainty quantification for medical image segmentation.
翻译:在临床应用中,分割模型的实用性通常基于其衍生的下游指标(如器官尺寸)的准确性,而非分割掩码本身的像素级精度。因此,对此类指标进行不确定性量化对于决策至关重要。保形预测(CP)是一种用于推导此类原则性不确定性保证的流行框架,但若将CP直接应用于最终的标量指标则效率低下,因为它将复杂的非线性分割-指标流水线视为黑箱。我们提出了COMPASS,这是一个通过利用底层深度神经网络的归纳偏置,为图像分割模型生成高效的基于指标的CP区间的实用框架。COMPASS通过在模型表示空间中直接扰动中间特征,沿着对目标指标最敏感的低维子空间进行校准。我们证明,在可交换性假设下,COMPASS能够实现有效的边际覆盖。实证研究表明,在皮肤病变和解剖结构面积估计的四个医学图像分割任务上,COMPASS产生的区间明显比传统CP基线更紧凑。此外,我们展示了利用学习到的内部特征来估计重要性权重,使得COMPASS在协变量偏移下也能恢复目标覆盖。COMPASS为医学图像分割中实用的基于指标的不确定性量化开辟了道路。