Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models

Foundation models have made significant strides in 2D and language tasks such as image segmentation, object detection, and visual-language understanding. Nevertheless, their potential to enhance 3D scene representation learning remains largely untapped due to the domain gap. In this paper, we propose an innovative methodology Bridge3D to address this gap, pre-training 3D models using features, semantic masks, and captions sourced from foundation models. Specifically, our approach utilizes semantic masks from these models to guide the masking and reconstruction process in the masked autoencoder. This strategy enables the network to concentrate more on foreground objects, thereby enhancing 3D representation learning. Additionally, we bridge the 3D-text gap at the scene level by harnessing image captioning foundation models. To further facilitate knowledge distillation from well-learned 2D and text representations to the 3D model, we introduce a novel method that employs foundation models to generate highly accurate object-level masks and semantic text information at the object level. Our approach notably outshines state-of-the-art methods in 3D object detection and semantic segmentation tasks. For instance, on the ScanNet dataset, our method surpasses the previous state-of-the-art method, PiMAE, by a significant margin of 5.3%.

翻译：基础模型在图像分割、目标检测和视觉-语言理解等二维及语言任务中取得了显著进展。然而，由于领域差异的存在，这些模型在提升三维场景表示学习方面的潜力尚未被充分挖掘。本文提出一种创新方法Bridge3D来解决这一鸿沟，利用从基础模型中提取的特征、语义掩码和描述文本对3D模型进行预训练。具体而言，我们的方法利用基础模型生成的语义掩码来引导掩码自编码器中的掩码与重建过程。该策略使网络能够更关注前景物体，从而增强3D表示学习。此外，我们通过利用图像描述基础模型，在场景层面弥合了3D与文本之间的差异。为了进一步促进从已充分学习的二维和文本表示向3D模型的知识蒸馏，我们引入了一种新颖方法，利用基础模型生成高精度的物体级掩码及物体级语义文本信息。我们的方法在3D目标检测与语义分割任务中显著优于现有最优方法。例如，在ScanNet数据集上，我们的方法以5.3%的显著优势超越了此前最优方法PiMAE。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日