PRED: Pre-training via Semantic Rendering on LiDAR Point Clouds

Pre-training is crucial in 3D-related fields such as autonomous driving where point cloud annotation is costly and challenging. Many recent studies on point cloud pre-training, however, have overlooked the issue of incompleteness, where only a fraction of the points are captured by LiDAR, leading to ambiguity during the training phase. On the other hand, images offer more comprehensive information and richer semantics that can bolster point cloud encoders in addressing the incompleteness issue inherent in point clouds. Yet, incorporating images into point cloud pre-training presents its own challenges due to occlusions, potentially causing misalignments between points and pixels. In this work, we propose PRED, a novel image-assisted pre-training framework for outdoor point clouds in an occlusion-aware manner. The main ingredient of our framework is a Birds-Eye-View (BEV) feature map conditioned semantic rendering, leveraging the semantics of images for supervision through neural rendering. We further enhance our model's performance by incorporating point-wise masking with a high mask ratio (95%). Extensive experiments demonstrate PRED's superiority over prior point cloud pre-training methods, providing significant improvements on various large-scale datasets for 3D perception tasks. Codes will be available at https://github.com/PRED4pc/PRED.

翻译：预训练在自动驾驶等三维相关领域至关重要，因为这些场景中点云标注成本高昂且颇具挑战。然而，近期许多关于点云预训练的研究忽视了数据不完整性问题——激光雷达仅能捕获部分点云，导致训练阶段存在歧义。另一方面，图像能提供更全面的信息和更丰富的语义，有助于点云编码器应对点云固有的不完整性问题。然而，将图像融入点云预训练会因遮挡问题带来新挑战，可能导致点与像素之间的错位。本文提出PRED——一种新颖的、基于遮挡感知机制的图像辅助室外点云预训练框架。该框架的核心是通过鸟瞰图（BEV）特征图条件约束的语义渲染技术，利用图像的语义信息通过神经渲染提供监督信号。我们进一步引入高掩码率（95%）的点级掩码机制来增强模型性能。大量实验表明，PRED在3D感知任务中优于现有各类点云预训练方法，并在多个大规模数据集上取得显著性能提升。代码将发布于 https://github.com/PRED4pc/PRED。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日