This paper addresses the challenge of 3D instance segmentation by simultaneously leveraging 3D geometric and multi-view image information. Many previous works have applied deep learning techniques to 3D point clouds for instance segmentation. However, these methods often failed to generalize to various types of scenes due to the scarcity and low-diversity of labeled 3D point cloud data. Some recent works have attempted to lift 2D instance segmentations to 3D within a bottom-up framework. The inconsistency in 2D instance segmentations among views can substantially degrade the performance of 3D segmentation. In this work, we introduce a novel 3D-to-2D query framework to effectively exploit 2D segmentation models for 3D instance segmentation. Specifically, we pre-segment the scene into several superpoints in 3D, formulating the task into a graph cut problem. The superpoint graph is constructed based on 2D segmentation models, where node features are obtained from multi-view image features and edge weights are computed based on multi-view segmentation results, enabling the better generalization ability. To process the graph, we train a graph neural network using pseudo 3D labels from 2D segmentation models. Experimental results on the ScanNet, ScanNet++ and KITTI-360 datasets demonstrate that our method achieves robust segmentation performance and can generalize across different types of scenes. Our project page is available at https://zju3dv.github.io/sam_graph.
翻译:本文针对三维实例分割任务,提出同时利用三维几何信息与多视角图像信息的解决方案。以往研究多采用深度学习技术处理三维点云以实现实例分割,但由于标注三维点云数据的稀缺性和低多样性,这些方法难以泛化至不同场景类型。近期部分工作尝试在自底向上框架中将二维实例分割提升至三维空间,但视角间二维实例分割的不一致性会显著降低三维分割性能。为此,我们提出一种新颖的三维到二维查询框架,以有效利用二维分割模型完成三维实例分割。具体而言,首先在三维空间中将场景预分割为若干超点,将任务形式化为图割问题。基于二维分割模型构建超点图,其中节点特征由多视角图像特征提取,边权重根据多视角分割结果计算,从而提升模型泛化能力。为处理该图结构,我们利用二维分割模型生成的伪三维标签训练图神经网络。在ScanNet、ScanNet++和KITTI-360数据集上的实验结果表明,本方法实现了鲁棒的分割性能,并能跨不同场景类型进行泛化。项目主页详见https://zju3dv.github.io/sam_graph。