To fully understand the 3D context of a single image, a visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order. Ideally, the system should be able to handle any object and not be restricted to segmenting a limited set of object classes, especially in robotic applications. Addressing this need, we introduce a diffusion model with cumulative occlusion learning designed for sequential amodal segmentation of objects with uncertain categories. This model iteratively refines the prediction using the cumulative mask strategy during diffusion, effectively capturing the uncertainty of invisible regions and adeptly reproducing the complex distribution of shapes and occlusion orders of occluded objects. It is akin to the human capability for amodal perception, i.e., to decipher the spatial ordering among objects and accurately predict complete contours for occluded objects in densely layered visual scenes. Experimental results across three amodal datasets show that our method outperforms established baselines.
翻译:为全面理解单张图像的三维上下文,视觉系统必须能够分割物体的可见与遮挡区域,同时辨别其遮挡顺序。理想情况下,该系统应能处理任意物体,而非局限于分割有限类别的物体,特别是在机器人应用场景中。针对这一需求,我们提出了一种具有累积遮挡学习能力的扩散模型,专为类别不确定物体的顺序非模态分割设计。该模型在扩散过程中通过累积掩码策略迭代优化预测,有效捕捉不可见区域的不确定性,并精准再现被遮挡物体的形状与遮挡顺序的复杂分布。这类似于人类非模态感知的能力,即在密集层叠的视觉场景中解读物体的空间顺序,并准确预测被遮挡物体的完整轮廓。在三个非模态数据集上的实验结果表明,我们的方法优于现有基线模型。