This paper addresses the challenge of 3D instance segmentation by simultaneously leveraging 3D geometric and multi-view image information. Many previous works have applied deep learning techniques to 3D point clouds for instance segmentation. However, these methods often failed to generalize to various types of scenes due to the scarcity and low-diversity of labeled 3D point cloud data. Some recent works have attempted to lift 2D instance segmentations to 3D within a bottom-up framework. The inconsistency in 2D instance segmentations among views can substantially degrade the performance of 3D segmentation. In this work, we introduce a novel 3D-to-2D query framework to effectively exploit 2D segmentation models for 3D instance segmentation. Specifically, we pre-segment the scene into several superpoints in 3D, formulating the task into a graph cut problem. The superpoint graph is constructed based on 2D segmentation models, where node features are obtained from multi-view image features and edge weights are computed based on multi-view segmentation results, enabling the better generalization ability. To process the graph, we train a graph neural network using pseudo 3D labels from 2D segmentation models. Experimental results on the ScanNet, ScanNet++ and KITTI-360 datasets demonstrate that our method achieves robust segmentation performance and can generalize across different types of scenes. Our project page is available at https://zju3dv.github.io/sam_graph.
翻译:本文针对三维实例分割的挑战,通过同时利用三维几何信息与多视图图像信息展开研究。先前许多工作将深度学习技术应用于三维点云实例分割,但由于标注三维点云数据的稀缺性和低多样性,这些方法往往难以泛化至各类场景。近期部分研究尝试在自底向上框架中将二维实例分割提升至三维空间,然而不同视角间二维实例分割的不一致性会显著降低三维分割性能。本文提出一种新颖的三维到二维查询框架,有效利用二维分割模型实现三维实例分割。具体而言,我们首先在三维空间中将场景预分割为若干超点,将任务转化为图割问题。超点图基于二维分割模型构建:节点特征通过多视图图像特征获取,边权重依据多视图分割结果计算,从而提升泛化能力。为处理该图结构,我们利用二维分割模型生成的伪三维标签训练图神经网络。在ScanNet、ScanNet++和KITTI-360数据集上的实验结果表明,本方法具备鲁棒的分割性能,并能有效泛化至不同类型场景。项目页面详见https://zju3dv.github.io/sam_graph。