3D decomposition/segmentation still remains a challenge as large-scale 3D annotated data is not readily available. Contemporary approaches typically leverage 2D machine-generated segments, integrating them for 3D consistency. While the majority of these methods are based on NeRFs, they face a potential weakness that the instance/semantic embedding features derive from independent MLPs, thus preventing the segmentation network from learning the geometric details of the objects directly through radiance and density. In this paper, we propose ClusteringSDF, a novel approach to achieve both segmentation and reconstruction in 3D via the neural implicit surface representation, specifically Signal Distance Function (SDF), where the segmentation rendering is directly integrated with the volume rendering of neural implicit surfaces. Although based on ObjectSDF++, ClusteringSDF no longer requires the ground-truth segments for supervision while maintaining the capability of reconstructing individual object surfaces, but purely with the noisy and inconsistent labels from pre-trained models.As the core of ClusteringSDF, we introduce a high-efficient clustering mechanism for lifting the 2D labels to 3D and the experimental results on the challenging scenes from ScanNet and Replica datasets show that ClusteringSDF can achieve competitive performance compared against the state-of-the-art with significantly reduced training time.
翻译:3D分解/分割仍是一项挑战,因为大规模3D标注数据难以获取。当前方法通常利用二维机器生成片段,并将其整合以实现三维一致性。尽管这些方法大多基于NeRF,但它们面临一个潜在缺陷——实例/语义嵌入特征源于独立的多层感知机(MLP),导致分割网络无法直接通过辐射度和密度学习物体的几何细节。本文提出ClusteringSDF,一种通过神经隐式曲面表征(具体为符号距离函数SDF)同时实现三维分割与重建的新方法,其分割渲染直接与神经隐式曲面的体渲染相融合。虽然基于ObjectSDF++,但ClusteringSDF不再需要真实标注片段进行监督,仅凭预训练模型产生的含噪且不一致的标签,即可保持重建单个物体表面的能力。作为ClusteringSDF的核心,我们引入了一种高效聚类机制将二维标签提升至三维。在ScanNet和Replica数据集的挑战性场景上的实验表明,ClusteringSDF能以显著缩短的训练时间达到与现有最优方法相当的竞争性能。