T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning

The scarcity of annotated data in LiDAR point cloud understanding hinders effective representation learning. Consequently, scholars have been actively investigating efficacious self-supervised pre-training paradigms. Nevertheless, temporal information, which is inherent in the LiDAR point cloud sequence, is consistently disregarded. To better utilize this property, we propose an effective pre-training strategy, namely Temporal Masked Auto-Encoders (T-MAE), which takes as input temporally adjacent frames and learns temporal dependency. A SiamWCA backbone, containing a Siamese encoder and a windowed cross-attention (WCA) module, is established for the two-frame input. Considering that the movement of an ego-vehicle alters the view of the same instance, temporal modeling also serves as a robust and natural data augmentation, enhancing the comprehension of target objects. SiamWCA is a powerful architecture but heavily relies on annotated data. Our T-MAE pre-training strategy alleviates its demand for annotated data. Comprehensive experiments demonstrate that T-MAE achieves the best performance on both Waymo and ONCE datasets among competitive self-supervised approaches.

翻译：激光雷达点云理解中标注数据的稀缺性阻碍了有效的表示学习。为此，学者们一直在积极探索高效的自监督预训练范式。然而，激光雷达点云序列中固有的时间信息却始终被忽视。为充分挖掘这一特性，我们提出了一种名为时间掩码自编码器（T-MAE）的有效预训练策略，该策略以时间相邻帧作为输入，学习时序依赖关系。针对双帧输入，我们构建了SiamWCA主干网络，其包含孪生编码器和窗口交叉注意力（WCA）模块。考虑到自车运动会改变同一实例的视角，时间建模本身也作为一种稳健且自然的数据增强手段，增强了对目标对象的理解。SiamWCA是一种强大的架构，但严重依赖标注数据。而我们的T-MAE预训练策略则降低了对标注数据的需求。综合实验表明，在Waymo和ONCE数据集上，T-MAE在竞争性的自监督方法中取得了最优性能。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日