Geometry and color information provided by the point clouds are both crucial for 3D scene understanding. Two pieces of information characterize the different aspects of point clouds, but existing methods lack an elaborate design for the discrimination and relevance. Hence we explore a 3D self-supervised paradigm that can better utilize the relations of point cloud information. Specifically, we propose a universal 3D scene pre-training framework via Geometry-Color Contrast (Point-GCC), which aligns geometry and color information using a Siamese network. To take care of actual application tasks, we design (i) hierarchical supervision with point-level contrast and reconstruct and object-level contrast based on the novel deep clustering module to close the gap between pre-training and downstream tasks; (ii) architecture-agnostic backbone to adapt for various downstream models. Benefiting from the object-level representation associated with downstream tasks, Point-GCC can directly evaluate model performance and the result demonstrates the effectiveness of our methods. Transfer learning results on a wide range of tasks also show consistent improvements across all datasets. e.g., new state-of-the-art object detection results on SUN RGB-D and S3DIS datasets. Codes will be released at https://github.com/Asterisci/Point-GCC.
翻译:摘要:点云提供的几何与色彩信息对于三维场景理解均至关重要。这两类信息表征了点云的不同属性,但现有方法缺乏对二者区分性与关联性的精细设计。为此,我们探索了一种能够更好利用点云信息关系的三维自监督范式。具体而言,我们提出了一种基于几何-色彩对比的通用三维场景预训练框架(Point-GCC),通过孪生网络对齐几何与色彩信息。为适配实际应用任务,我们设计了:(i)基于层级监督机制,包括点级对比与重建、基于新型深度聚类模块的物体级对比,以弥合预训练与下游任务之间的差距;(ii)架构无关的主干网络,以适应多种下游模型。得益于与下游任务关联的物体级表征,Point-GCC可直接评估模型性能,实验结果验证了方法的有效性。在多项任务上的迁移学习结果表明,所有数据集均取得一致性提升——例如在SUN RGB-D和S3DIS数据集上达到新型最优物体检测性能。代码将于https://github.com/Asterisci/Point-GCC 公开。