NSM4D: Neural Scene Model Based Online 4D Point Cloud Sequence Understanding

Understanding 4D point cloud sequences online is of significant practical value in various scenarios such as VR/AR, robotics, and autonomous driving. The key goal is to continuously analyze the geometry and dynamics of a 3D scene as unstructured and redundant point cloud sequences arrive. And the main challenge is to effectively model the long-term history while keeping computational costs manageable. To tackle these challenges, we introduce a generic online 4D perception paradigm called NSM4D. NSM4D serves as a plug-and-play strategy that can be adapted to existing 4D backbones, significantly enhancing their online perception capabilities for both indoor and outdoor scenarios. To efficiently capture the redundant 4D history, we propose a neural scene model that factorizes geometry and motion information by constructing geometry tokens separately storing geometry and motion features. Exploiting the history becomes as straightforward as querying the neural scene model. As the sequence progresses, the neural scene model dynamically deforms to align with new observations, effectively providing the historical context and updating itself with the new observations. By employing token representation, NSM4D also exhibits robustness to low-level sensor noise and maintains a compact size through a geometric sampling scheme. We integrate NSM4D with state-of-the-art 4D perception backbones, demonstrating significant improvements on various online perception benchmarks in indoor and outdoor settings. Notably, we achieve a 9.6% accuracy improvement for HOI4D online action segmentation and a 3.4% mIoU improvement for SemanticKITTI online semantic segmentation. Furthermore, we show that NSM4D inherently offers excellent scalability to longer sequences beyond the training set, which is crucial for real-world applications.

翻译：在线理解4D点云序列在VR/AR、机器人技术和自动驾驶等多种场景中具有重要的实际应用价值。其核心目标是在非结构化且冗余的点云序列不断到达时，持续分析3D场景的几何特性与动态变化。主要挑战在于有效建模长期历史信息的同时，将计算成本控制在可管理范围内。为应对这些挑战，我们提出了一种名为NSM4D的通用在线4D感知范式。作为一种即插即用策略，NSM4D可适配现有4D主干网络，显著增强其在室内外场景中的在线感知能力。为了高效捕获冗余的4D历史信息，我们提出了一种神经场景模型，通过构建独立存储几何特征与运动特征的几何令牌，将几何信息和运动信息进行解耦分解。利用历史信息变得如同查询神经场景模型一样简单。随着序列推进，神经场景模型通过动态变形与新的观测数据对齐，既有效提供历史上下文，又用新观测数据更新自身。通过采用令牌表示，NSM4D对底层传感器噪声展现出鲁棒性，并借助几何采样方案保持紧凑的模型尺寸。我们将NSM4D与当前最先进的4D感知主干网络集成，在室内外多种在线感知基准测试中取得了显著提升。值得注意的是，我们在HOI4D在线动作分割任务中实现了9.6%的准确率提升，在SemanticKTTI在线语义分割任务中实现了3.4%的平均交并比（mIoU）提升。此外，我们证明NSM4D天然具有超越训练集长度扩展至更长序列的优异可扩展性，这对实际应用场景至关重要。