Point cloud video representation learning is challenging due to complex structures and unordered spatial arrangement. Traditional methods struggle with frame-to-frame correlations and point-wise correspondence tracking. Recently, partial differential equations (PDE) have provided a new perspective in uniformly solving spatial-temporal data information within certain constraints. While tracking tangible point correspondence remains challenging, we propose to formalize point cloud video representation learning as a PDE-solving problem. Inspired by fluid analysis, where PDEs are used to solve the deformation of spatial shape over time, we employ PDE to solve the variations of spatial points affected by temporal information. By modeling spatial-temporal correlations, we aim to regularize spatial variations with temporal features, thereby enhancing representation learning in point cloud videos. We introduce Motion PointNet composed of a PointNet-like encoder and a PDE-solving module. Initially, we construct a lightweight yet effective encoder to model an initial state of the spatial variations. Subsequently, we develop our PDE-solving module in a parameterized latent space, tailored to address the spatio-temporal correlations inherent in point cloud video. The process of solving PDE is guided and refined by a contrastive learning structure, which is pivotal in reshaping the feature distribution, thereby optimizing the feature representation within point cloud video data. Remarkably, our Motion PointNet achieves an impressive accuracy of 97.52% on the MSRAction-3D dataset, surpassing the current state-of-the-art in all aspects while consuming minimal resources (only 0.72M parameters and 0.82G FLOPs).
翻译:点云视频表示学习因其复杂的结构和无序的空间排列而具有挑战性。传统方法难以处理帧间关联和逐点对应跟踪。近年来,偏微分方程(PDE)为在特定约束下统一求解时空数据信息提供了新的视角。尽管跟踪具体的点对应关系仍然困难,我们提出将点云视频表示学习形式化为一个PDE求解问题。受流体分析的启发,其中PDE用于求解空间形状随时间发生的形变,我们采用PDE来求解受时间信息影响的空间点变化。通过对时空相关性进行建模,我们的目标是用时间特征来正则化空间变化,从而增强点云视频中的表示学习。我们提出了由类PointNet编码器和PDE求解模块组成的Motion PointNet。首先,我们构建了一个轻量级但有效的编码器来建模空间变化的初始状态。随后,我们在参数化的潜在空间中开发了PDE求解模块,专门用于处理点云视频中固有的时空相关性。PDE的求解过程由对比学习结构引导和优化,该结构在重塑特征分布方面至关重要,从而优化了点云视频数据中的特征表示。值得注意的是,我们的Motion PointNet在MSRAction-3D数据集上取得了97.52%的惊人准确率,在各个方面都超越了当前的最先进技术,同时消耗了最少的资源(仅0.72M参数和0.82G FLOPs)。