The proliferation of 2D foundation models has sparked research into adapting them for open-world 3D instance segmentation. Recent methods introduce a paradigm that leverages superpoints as geometric primitives and incorporates 2D multi-view masks from Segment Anything model (SAM) as merging guidance, achieving outstanding zero-shot instance segmentation results. However, the limited use of 3D priors restricts the segmentation performance. Previous methods calculate the 3D superpoints solely based on estimated normal from spatial coordinates, resulting in under-segmentation for instances with similar geometry. Besides, the heavy reliance on SAM and hand-crafted algorithms in 2D space suffers from over-segmentation due to SAM's inherent part-level segmentation tendency. To address these issues, we propose SA3DIP, a novel method for Segmenting Any 3D Instances via exploiting potential 3D Priors. Specifically, on one hand, we generate complementary 3D primitives based on both geometric and textural priors, which reduces the initial errors that accumulate in subsequent procedures. On the other hand, we introduce supplemental constraints from the 3D space by using a 3D detector to guide a further merging process. Furthermore, we notice a considerable portion of low-quality ground truth annotations in ScanNetV2 benchmark, which affect the fair evaluations. Thus, we present ScanNetV2-INS with complete ground truth labels and supplement additional instances for 3D class-agnostic instance segmentation. Experimental evaluations on various 2D-3D datasets demonstrate the effectiveness and robustness of our approach. Our code and proposed ScanNetV2-INS dataset are available HERE.
翻译:二维基础模型的激增引发了将其应用于开放世界三维实例分割的研究。近期方法引入了一种范式,利用超点作为几何基元,并整合来自Segment Anything模型(SAM)的二维多视角掩码作为合并指导,实现了出色的零样本实例分割效果。然而,三维先验的有限使用限制了分割性能。先前方法仅基于空间坐标估计的法线计算三维超点,导致几何结构相似的实例出现欠分割。此外,二维空间中过度依赖SAM和手工算法会因SAM固有的部件级分割倾向而产生过分割问题。为解决这些问题,我们提出SA3DIP,一种通过挖掘潜在三维先验来分割任意三维实例的新方法。具体而言,一方面,我们基于几何和纹理先验生成互补的三维基元,从而减少在后续流程中累积的初始误差。另一方面,我们通过使用三维检测器指导进一步合并过程,引入来自三维空间的补充约束。此外,我们注意到ScanNetV2基准中存在大量低质量真实标注,影响了公平评估。因此,我们提出了具有完整真实标签的ScanNetV2-INS数据集,并为三维类别无关实例分割补充了额外实例。在多个二维-三维数据集上的实验评估证明了我们方法的有效性和鲁棒性。我们的代码和提出的ScanNetV2-INS数据集可通过HERE获取。