The Segment Anything Model (SAM) has demonstrated its effectiveness in segmenting any object/part in various 2D images, yet its ability for 3D has not been fully explored. The real world is composed of numerous 3D scenes and objects. Due to the scarcity of accessible 3D data and high cost of its acquisition and annotation, lifting SAM to 3D is a challenging but valuable research avenue. With this in mind, we propose a novel framework to Segment Anything in 3D, named SA3D. Given a neural radiance field (NeRF) model, SA3D allows users to obtain the 3D segmentation result of any target object via only one-shot manual prompting in a single rendered view. With input prompts, SAM cuts out the target object from the according view. The obtained 2D segmentation mask is projected onto 3D mask grids via density-guided inverse rendering. 2D masks from other views are then rendered, which are mostly uncompleted but used as cross-view self-prompts to be fed into SAM again. Complete masks can be obtained and projected onto mask grids. This procedure is executed via an iterative manner while accurate 3D masks can be finally learned. SA3D can adapt to various radiance fields effectively without any additional redesigning. The entire segmentation process can be completed in approximately two minutes without any engineering optimization. Our experiments demonstrate the effectiveness of SA3D in different scenes, highlighting the potential of SAM in 3D scene perception. The project page is at https://jumpat.github.io/SA3D/.
翻译:分割一切模型(SAM)在二维图像中分割任意物体/部件方面已展现出有效性,但其三维能力尚未被充分探索。真实世界由大量三维场景和物体组成。由于可访问三维数据的稀缺性及其采集与标注的高昂成本,将SAM拓展至三维领域是一项具有挑战性但极具价值的研究方向。基于此,我们提出了一种名为SA3D的新型三维通用分割框架。给定神经辐射场(NeRF)模型,SA3D允许用户通过单个渲染视图中的单次手动提示,即可获得任意目标物体的三维分割结果。输入提示后,SAM从对应视图中分割出目标物体。所获得的二维分割掩码通过密度引导的逆渲染投影至三维掩码网格。随后渲染其他视角的二维掩码(虽大多不完整),但仍作为跨视角自提示再次输入SAM。通过获取完整掩码并投影至掩码网格,该迭代过程使三维掩码最终被精准学习。SA3D无需额外重新设计即可有效适配各类辐射场。无需任何工程优化即可在大约两分钟内完成完整分割流程。实验证明SA3D在不同场景中的有效性,凸显了SAM在三维场景感知中的潜力。项目页面:https://jumpat.github.io/SA3D/。