Many segmentation tasks, such as medical image segmentation or future state prediction, are inherently ambiguous, meaning that multiple predictions are equally correct. Current methods typically rely on generative models to capture this uncertainty. However, identifying the underlying modes of the distribution with these methods is computationally expensive, requiring large numbers of samples and post-hoc clustering. In this paper, we shift the focus from stochastic sampling to the direct generation of likely outcomes. We introduce mode proposal models, a deterministic framework that efficiently produces a fixed-size set of proposal masks in a single forward pass. To handle superfluous proposals, we adapt a confidence mechanism, traditionally used in object detection, to the high-dimensional space of segmentation masks. Our approach significantly reduces inference time while achieving higher ground-truth coverage than existing generative models. Furthermore, we demonstrate that our model can be trained without knowing the full distribution of outcomes, making it applicable to real-world datasets. Finally, we show that by decomposing the velocity field of a pre-trained flow model, we can efficiently estimate prior mode probabilities for our proposals.
翻译:许多分割任务(如医学图像分割或未来状态预测)本质上具有模糊性,意味着多个预测结果可能同样正确。现有方法通常依赖生成模型来捕获这种不确定性。然而,这些方法识别分布中潜在模态的过程计算成本高昂,需要大量样本和事后聚类。本文从随机采样转向直接生成可能结果,提出模态提议模型——一种确定性框架,能够通过单次前向传播高效生成固定大小的提议掩膜集合。为处理冗余提议,我们将传统用于目标检测的置信机制适配至分割掩膜的高维空间。该方法在实现比现有生成模型更高真值覆盖率的同时,显著减少推理时间。此外,我们证明该模型无需知晓完整结果分布即可训练,使其适用于现实数据集。最后,通过分解预训练流模型的速率场,我们能够高效估计提议的先验模态概率。