Dynamic 3D point cloud sequences serve as one of the most common and practical representation modalities of dynamic real-world environments. However, their unstructured nature in both spatial and temporal domains poses significant challenges to effective and efficient processing. Existing deep point cloud sequence modeling approaches imitate the mature 2D video learning mechanisms by developing complex spatio-temporal point neighbor grouping and feature aggregation schemes, often resulting in methods lacking effectiveness, efficiency, and expressive power. In this paper, we propose a novel generic representation called \textit{Structured Point Cloud Videos} (SPCVs). Intuitively, by leveraging the fact that 3D geometric shapes are essentially 2D manifolds, SPCV re-organizes a point cloud sequence as a 2D video with spatial smoothness and temporal consistency, where the pixel values correspond to the 3D coordinates of points. The structured nature of our SPCV representation allows for the seamless adaptation of well-established 2D image/video techniques, enabling efficient and effective processing and analysis of 3D point cloud sequences. To achieve such re-organization, we design a self-supervised learning pipeline that is geometrically regularized and driven by self-reconstructive and deformation field learning objectives. Additionally, we construct SPCV-based frameworks for both low-level and high-level 3D point cloud sequence processing and analysis tasks, including action recognition, temporal interpolation, and compression. Extensive experiments demonstrate the versatility and superiority of the proposed SPCV, which has the potential to offer new possibilities for deep learning on unstructured 3D point cloud sequences. Code will be released at https://github.com/ZENGYIMING-EAMON/SPCV.
翻译:动态三维点云序列是真实动态环境中最常见且实用的表征模态之一。然而,其在空间和时间维度上的非结构化特性给高效处理带来了重大挑战。现有深度点云序列建模方法通过构建复杂的时空点邻域分组与特征聚合方案模仿成熟的二维视频学习机制,但往往导致方法在有效性、效率和表达能力上存在不足。本文提出一种名为"结构化点云视频"(SPCV)的新型通用表征。直观而言,SPCV利用三维几何形状本质上是二维流形这一事实,将点云序列重新组织为具有空间平滑性和时间一致性的二维视频,其中像素值对应点的三维坐标。SPCV表征的结构化特性能够无缝适配成熟的二维图像/视频技术,实现对三维点云序列的高效处理与分析。为实现这种重组,我们设计了一种自监督学习流程,该流程通过几何正则化驱动,并基于自重构与变形场学习目标。此外,我们构建了基于SPCV的低级与高级三维点云序列处理分析任务框架(包括动作识别、时域插值与压缩)。大量实验证明了所提SPCV的通用性与优越性,它为非结构化三维点云序列上的深度学习提供了新的可能性。代码将发布于 https://github.com/ZENGYIMING-EAMON/SPCV。