H3D-DGS：探索异构三维运动表示用于可变形三维高斯溅射 (H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting)

Dynamic scene reconstruction poses a persistent challenge in 3D vision. Deformable 3D Gaussian Splatting has emerged as an effective method for this task, offering real-time rendering and high visual fidelity. This approach decomposes a dynamic scene into a static representation in a canonical space and time-varying scene motion. Scene motion is defined as the collective movement of all Gaussian points, and for compactness, existing approaches commonly adopt implicit neural fields or sparse control points. However, these methods predominantly rely on gradient-based optimization for all motion information. Due to the high degree of freedom, they struggle to converge on real-world datasets exhibiting complex motion. To preserve the compactness of motion representation and address convergence challenges, this paper proposes heterogeneous 3D control points, termed \textbf{H3D control points}, whose attributes are obtained using a hybrid strategy combining optical flow back-projection and gradient-based methods. This design decouples directly observable motion components from those that are geometrically occluded. Specifically, components of 3D motion that project onto the image plane are directly acquired via optical flow back projection, while unobservable portions are refined through gradient-based optimization. Experiments on the Neu3DV and CMU-Panoptic datasets demonstrate that our method achieves superior performance over state-of-the-art deformable 3D Gaussian splatting techniques. Remarkably, our method converges within just 100 iterations and achieves a per-frame processing speed of 2 seconds on a single NVIDIA RTX 4070 GPU.

翻译：动态场景重建是三维视觉领域的一个持续挑战。可变形三维高斯溅射已成为解决该任务的有效方法，能够实现实时渲染并保持高视觉保真度。该方法将动态场景分解为规范空间中的静态表示与时变的场景运动。场景运动被定义为所有高斯点的集体位移，为保持表示紧凑性，现有方法通常采用隐式神经场或稀疏控制点。然而，这些方法主要依赖基于梯度的优化来获取所有运动信息。由于自由度较高，它们在具有复杂运动的真实世界数据集上难以收敛。为保持运动表示的紧凑性并解决收敛难题，本文提出异构三维控制点（称为**H3D控制点**），其属性通过结合光流反向投影与梯度方法的混合策略获得。该设计将可直接观测的运动分量与几何遮挡的分量解耦。具体而言，三维运动中投影到图像平面的分量通过光流反向投影直接获取，而不可观测部分则通过基于梯度的优化进行细化。在Neu3DV和CMU-Panoptic数据集上的实验表明，我们的方法在性能上超越了当前最先进的可变形三维高斯溅射技术。值得注意的是，我们的方法仅需100次迭代即可收敛，并在单块NVIDIA RTX 4070 GPU上实现了每帧2秒的处理速度。