Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos

Recently, the community has made tremendous progress in developing effective methods for point cloud video understanding that learn from massive amounts of labeled data. However, annotating point cloud videos is usually notoriously expensive. Moreover, training via one or only a few traditional tasks (e.g., classification) may be insufficient to learn subtle details of the spatio-temporal structure existing in point cloud videos. In this paper, we propose a Masked Spatio-Temporal Structure Prediction (MaST-Pre) method to capture the structure of point cloud videos without human annotations. MaST-Pre is based on spatio-temporal point-tube masking and consists of two self-supervised learning tasks. First, by reconstructing masked point tubes, our method is able to capture the appearance information of point cloud videos. Second, to learn motion, we propose a temporal cardinality difference prediction task that estimates the change in the number of points within a point tube. In this way, MaST-Pre is forced to model the spatial and temporal structure in point cloud videos. Extensive experiments on MSRAction-3D, NTU-RGBD, NvGesture, and SHREC'17 demonstrate the effectiveness of the proposed method.

翻译：近年来，社区在开发从大量标注数据中学习点云视频理解的有效方法方面取得了巨大进展。然而，标注点云视频通常成本高昂。此外，通过单一或少数传统任务（例如分类）进行训练，可能不足以学习点云视频中存在的时空结构的细微细节。本文提出了一种掩码时空结构预测方法（MaST-Pre），以在无需人工标注的情况下捕捉点云视频的结构。MaST-Pre基于时空点管掩码，包含两个自监督学习任务。首先，通过重建掩码点管，我们的方法能够捕捉点云视频的外观信息。其次，为学习运动信息，我们提出了一项时间基数差异预测任务，该任务估计点管内点数量的变化。通过这种方式，MaST-Pre被迫对点云视频中的空间和时间结构进行建模。在MSRAction-3D、NTU-RGBD、NvGesture和SHREC'17数据集上的大量实验证明了所提方法的有效性。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日