Accurately detecting objects in the environment is a key challenge for autonomous vehicles. However, obtaining annotated data for detection is expensive and time-consuming. We introduce PatchContrast, a novel self-supervised point cloud pre-training framework for 3D object detection. We propose to utilize two levels of abstraction to learn discriminative representation from unlabeled data: proposal-level and patch-level. The proposal-level aims at localizing objects in relation to their surroundings, whereas the patch-level adds information about the internal connections between the object's components, hence distinguishing between different objects based on their individual components. We demonstrate how these levels can be integrated into self-supervised pre-training for various backbones to enhance the downstream 3D detection task. We show that our method outperforms existing state-of-the-art models on three commonly-used 3D detection datasets.
翻译:精确检测环境中的物体是自动驾驶汽车面临的关键挑战。然而,获取用于检测的标注数据成本高昂且耗时。我们提出PatchContrast,一种新颖的、面向3D物体检测的自监督点云预训练框架。我们利用两个抽象层次从未标记数据中学习判别性表示:提案级和补丁级。提案级旨在定位物体与其周围环境的关联,而补丁级则补充物体各组成部分之间的内部连接信息,从而通过其个体组件区分不同物体。我们展示了如何将这些层次集成到针对多种骨干网络的自监督预训练中,以提升下游3D检测任务。实验表明,我们的方法在三个常用的3D检测数据集上超越了现有的最先进模型。