Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration

This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes. Specifically, our approach, named NCLR, focuses on 2D-3D neural calibration, a novel pretext task that estimates the rigid transformation aligning camera and LiDAR coordinate systems. First, we propose the learnable transformation alignment to bridge the domain gap between image and point cloud data, converting features into a unified representation space for effective comparison and matching. Second, we identify the overlapping area between the image and point cloud with the fused features. Third, we establish dense 2D-3D correspondences to estimate the rigid transformation. The framework not only learns fine-grained matching from points to pixels but also achieves alignment of the image and point cloud at a holistic level, understanding their relative pose. We demonstrate NCLR's efficacy by applying the pre-trained backbone to downstream tasks, such as LiDAR-based 3D semantic segmentation, object detection, and panoptic segmentation. Comprehensive experiments on various datasets illustrate the superiority of NCLR over existing self-supervised methods. The results confirm that joint learning from different modalities significantly enhances the network's understanding abilities and effectiveness of learned representation. Code will be available at \url{https://github.com/Eaphan/NCLR}.

翻译：本文提出了一种新颖的自监督学习框架，用于增强自动驾驶场景中的三维感知能力。具体而言，我们的方法名为NCLR，专注于2D-3D神经校准——一种新型的预训练任务，通过估计对齐相机和LiDAR坐标系的刚体变换来实现。首先，我们提出可学习的变换对齐方法，以弥合图像与点云数据之间的领域差异，将特征转换到统一的表征空间中进行有效比较与匹配。其次，利用融合特征识别图像与点云之间的重叠区域。再次，建立密集的2D-3D对应关系以估计刚体变换。该框架不仅从点到像素的细粒度匹配中学习，还能在整体层面实现图像与点云的对齐，理解其相对位姿。我们将预训练主干网络应用于下游任务，如基于LiDAR的三维语义分割、目标检测和全景分割，从而验证了NCLR的有效性。在多个数据集上的全面实验表明，NCLR优于现有自监督方法。实验结果证实，跨模态联合学习能显著提升网络的理解能力与所学表征的有效性。代码将在\url{https://github.com/Eaphan/NCLR}提供。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日