The problem we study in this paper is amodal image segmentation: predicting entire object segmentation masks including both visible and invisible (occluded) parts. In previous work, the amodal segmentation ground truth on real images is usually predicted by manual annotaton and thus is subjective. In contrast, we use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images. This pipeline is used to construct an amodal completion evaluation benchmark, MP3D-Amodal, consisting of a variety of object categories and labels. To better handle the amodal completion task in the wild, we explore two architecture variants: a two-stage model that first infers the occluder, followed by amodal mask completion; and a one-stage model that exploits the representation power of Stable Diffusion for amodal segmentation across many categories. Without bells and whistles, our method achieves a new state-of-the-art performance on Amodal segmentation datasets that cover a large variety of objects, including COCOA and our new MP3D-Amodal dataset. The dataset, model, and code are available at https://www.robots.ox.ac.uk/~vgg/research/amodal/.
翻译:本文研究的问题是**非完整形态图像分割**:预测包含可见与不可见(被遮挡)部分的完整物体分割掩码。此前工作中,真实图像的非完整形态分割标注通常依赖人工标注,因此具有主观性。相比之下,我们利用3D数据建立自动化流程,为真实图像中的部分遮挡物体确定真实的非完整形态掩码。该流程被用于构建非完整形态补全评估基准**MP3D-Amodal**,涵盖多种物体类别与标签。为更好地处理真实环境下的非完整形态补全任务,我们探索两种架构变体:一种两阶段模型,先推断遮挡物,再进行非完整形态掩码补全;另一种单阶段模型,利用Stable Diffusion的强大表征能力实现多类别非完整形态分割。无需复杂设计,我们的方法在涵盖大量物体类别的非完整形态分割数据集(包括COCOA及新提出的MP3D-Amodal)上均达到当前最优性能。数据集、模型及代码已开源在https://www.robots.ox.ac.uk/~vgg/research/amodal/。