We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions. By capitalizing on large-scale diffusion models and transferring their representations to this task, we learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases, including examples that break natural and physical priors, such as art. As training data, we use a synthetically curated dataset containing occluded objects paired with their whole counterparts. Experiments show that our approach outperforms supervised baselines on established benchmarks. Our model can furthermore be used to significantly improve the performance of existing object recognition and 3D reconstruction methods in the presence of occlusions.
翻译:我们提出pix2gestalt,一种零样本非模态分割框架,通过学习估计被遮挡条件下仅部分可见的完整物体的形状和外观。通过利用大规模扩散模型并将其表征迁移至该任务,我们学习一个条件扩散模型,用于在具有挑战性的零样本案例中重建完整物体,包括违反自然和物理先验的示例(如艺术品)。训练数据采用人工合成的数据集,其中包含成对的遮挡物体及其完整对应物。实验表明,我们的方法在现有基准测试上优于有监督基线模型。此外,该模型可显著提升现有物体识别与三维重建方法在遮挡场景下的性能。