Geometry and color information provided by the point clouds are both crucial for 3D scene understanding. Two pieces of information characterize the different aspects of point clouds, but existing methods lack an elaborate design for the discrimination and relevance. Hence we explore a 3D self-supervised paradigm that can better utilize the relations of point cloud information. Specifically, we propose a universal 3D scene pre-training framework via Geometry-Color Contrast (Point-GCC), which aligns geometry and color information using a Siamese network. To take care of actual application tasks, we design (i) hierarchical supervision with point-level contrast and reconstruct and object-level contrast based on the novel deep clustering module to close the gap between pre-training and downstream tasks; (ii) architecture-agnostic backbone to adapt for various downstream models. Benefiting from the object-level representation associated with downstream tasks, Point-GCC can directly evaluate model performance and the result demonstrates the effectiveness of our methods. Transfer learning results on a wide range of tasks also show consistent improvements across all datasets. e.g., new state-of-the-art object detection results on SUN RGB-D and S3DIS datasets. Codes will be released at https://github.com/Asterisci/Point-GCC.
翻译:摘要:点云提供的几何与颜色信息对于三维场景理解均至关重要。这两种信息表征了点云的不同维度,但现有方法缺乏对二者区分性与关联性的精细设计。为此,我们探索了一种能够更好利用点云信息关系的三维自监督范式。具体而言,我们提出了一种基于几何-颜色对比的通用三维场景预训练框架(Point-GCC),通过孪生网络对齐几何与颜色信息。为适配实际应用任务,我们设计了:(i) 基于点级对比与重构的分层监督,以及基于新型深度聚类模块的物体级对比,以弥合预训练与下游任务之间的差距;(ii) 架构无关的骨干网络,以适应各类下游模型。得益于与下游任务关联的物体级表征,Point-GCC可直接评估模型性能,实验结果证明了方法的有效性。在广泛任务上的迁移学习结果均显示所有数据集的一致性提升,例如在SUN RGB-D和S3DIS数据集上取得了新的最佳目标检测结果。代码将发布在https://github.com/Asterisci/Point-GCC。