We introduce SAMPro3D for zero-shot 3D indoor scene segmentation. Given the 3D point cloud and multiple posed 2D frames of 3D scenes, our approach segments 3D scenes by applying the pretrained Segment Anything Model (SAM) to 2D frames. Our key idea involves locating 3D points in scenes as natural 3D prompts to align their projected pixel prompts across frames, ensuring frame-consistency in both pixel prompts and their SAM-predicted masks. Moreover, we suggest filtering out low-quality 3D prompts based on feedback from all 2D frames, for enhancing segmentation quality. We also propose to consolidate different 3D prompts if they are segmenting the same object, bringing a more comprehensive segmentation. Notably, our method does not require any additional training on domain-specific data, enabling us to preserve the zero-shot power of SAM. Extensive qualitative and quantitative results show that our method consistently achieves higher quality and more diverse segmentation than previous zero-shot or fully supervised approaches, and in many cases even surpasses human-level annotations. The project page can be accessed at https://mutianxu.github.io/sampro3d/.
翻译:我们提出SAMPro3D方法,用于零样本三维室内场景分割。针对三维点云及多张带位姿的二维帧图像,该方法通过将预训练的Segment Anything Model(SAM)应用于二维帧实现三维场景分割。核心思想是:在场景中定位三维点作为自然三维提示,使其投影生成的像素提示在帧间对齐,确保像素提示及其SAM预测掩码的帧一致性。此外,我们建议基于所有二维帧的反馈过滤低质量三维提示,以提升分割质量。同时,针对分割同一对象的不同三维提示,提出合并策略以实现更全面的分割。值得注意的是,本方法无需在领域特定数据上进行额外训练,从而保留SAM的零样本能力。广泛的定性与定量结果表明,本方法在分割质量与多样性上持续优于现有零样本或全监督方法,且在许多场景中甚至超越了人工标注水平。项目页面可访问 https://mutianxu.github.io/sampro3d/。