This paper studies amodal image segmentation: predicting entire object segmentation masks including both visible and invisible (occluded) parts. In previous work, the amodal segmentation ground truth on real images is usually predicted by manual annotaton and thus is subjective. In contrast, we use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images. This pipeline is used to construct an amodal completion evaluation benchmark, MP3D-Amodal, consisting of a variety of object categories and labels. To better handle the amodal completion task in the wild, we explore two architecture variants: a two-stage model that first infers the occluder, followed by amodal mask completion; and a one-stage model that exploits the representation power of Stable Diffusion for amodal segmentation across many categories. Without bells and whistles, our method achieves a new state-of-the-art performance on Amodal segmentation datasets that cover a large variety of objects, including COCOA and our new MP3D-Amodal dataset. The dataset, model, and code are available at https://www.robots.ox.ac.uk/~vgg/research/amodal/.
翻译:本文研究非模态图像分割:预测包含可见与不可见(遮挡)部分的完整物体分割掩码。以往工作中,真实图像的非模态分割标注通常依赖人工标注因而具有主观性。与之相对,我们利用三维数据建立自动化流程,为真实图像中部分遮挡的物体确定真实的非模态标注掩码。该流程用于构建非模态完成评估基准MP3D-Amodal,涵盖多种物体类别与标签。为更好处理野外场景下的非模态完成任务,我们探索两种架构变体:一种先推断遮挡物的两阶段模型,随后进行非模态掩码完成;另一种利用Stable Diffusion表征能力的单阶段模型,实现跨类别的非模态分割。在不依赖额外技术手段的情况下,我们的方法在覆盖大量物体的非模态分割数据集(包括COCOA及新构建的MP3D-Amodal)上取得了最新最优性能。数据集、模型与代码已开源至https://www.robots.ox.ac.uk/~vgg/research/amodal/。