Panoptic reconstruction is a challenging task in 3D scene understanding. However, most existing methods heavily rely on pre-trained semantic segmentation models and known 3D object bounding boxes for 3D panoptic segmentation, which is not available for in-the-wild scenes. In this paper, we propose a novel zero-shot panoptic reconstruction method from RGB-D images of scenes. For zero-shot segmentation, we leverage open-vocabulary instance segmentation, but it has to face partial labeling and instance association challenges. We tackle both challenges by propagating partial labels with the aid of dense generalized features and building a 3D instance graph for associating 2D instance IDs. Specifically, we exploit partial labels to learn a classifier for generalized semantic features to provide complete labels for scenes with dense distilled features. Moreover, we formulate instance association as a 3D instance graph segmentation problem, allowing us to fully utilize the scene geometry prior and all 2D instance masks to infer global unique pseudo 3D instance ID. Our method outperforms state-of-the-art methods on the indoor dataset ScanNet V2 and the outdoor dataset KITTI-360, demonstrating the effectiveness of our graph segmentation method and reconstruction network.
翻译:全景重建是三维场景理解中的一项具有挑战性的任务。然而,现有方法大多严重依赖预训练的语义分割模型和已知的三维物体边界框进行三维全景分割,这对于真实世界场景而言通常是不可获取的。本文提出了一种从场景的RGB-D图像进行零样本全景重建的新方法。为实现零样本分割,我们利用了开放词汇实例分割,但该方法需面对部分标注和实例关联的挑战。我们通过借助密集泛化特征传播部分标注,并构建一个三维实例图来关联二维实例ID,从而应对这两项挑战。具体而言,我们利用部分标注来学习一个针对泛化语义特征的分类器,从而为具有密集蒸馏特征的场景提供完整的标注。此外,我们将实例关联问题形式化为一个三维实例图分割问题,这使得我们能够充分利用场景几何先验和所有二维实例掩码来推断全局唯一的伪三维实例ID。我们的方法在室内数据集ScanNet V2和室外数据集KITTI-360上均优于现有最先进方法,证明了我们的图分割方法和重建网络的有效性。