Equipping predicted segmentation with calibrated uncertainty is essential for safety-critical applications. In this work, we focus on capturing the data-inherent uncertainty (aka aleatoric uncertainty) in segmentation, typically when ambiguities exist in input images. Due to the high-dimensional output space and potential multiple modes in segmenting ambiguous images, it remains challenging to predict well-calibrated uncertainty for segmentation. To tackle this problem, we propose a novel mixture of stochastic experts (MoSE) model, where each expert network estimates a distinct mode of the aleatoric uncertainty and a gating network predicts the probabilities of an input image being segmented in those modes. This yields an efficient two-level uncertainty representation. To learn the model, we develop a Wasserstein-like loss that directly minimizes the distribution distance between the MoSE and ground truth annotations. The loss can easily integrate traditional segmentation quality measures and be efficiently optimized via constraint relaxation. We validate our method on the LIDC-IDRI dataset and a modified multimodal Cityscapes dataset. Results demonstrate that our method achieves the state-of-the-art or competitive performance on all metrics.
翻译:摘要:为预测分割赋予校准的不确定性对于安全关键型应用至关重要。本文聚焦于捕捉分割中由数据固有不确定性(即随机不确定性)引起的问题,尤其是在输入图像存在歧义的情况下。由于分割输出空间维度高且歧义图像可能包含多种模式,预测校准良好的分割不确定性仍具挑战。为解决该问题,我们提出一种新型随机专家混合(MoSE)模型,其中每个专家网络估计随机不确定性的一个不同模式,门控网络预测输入图像被分割为这些模式的概率,从而形成高效的两级不确定性表征。为学习该模型,我们开发了一种类似Wasserstein的距离损失函数,可直接最小化MoSE与真实标注之间的分布距离。该损失易于集成传统分割质量度量,并通过约束松弛实现高效优化。我们在LIDC-IDRI数据集及改进的多模态Cityscapes数据集上验证了方法,结果表明本方法在所有指标上均达到或超越现有最优水平。