Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. Recently, Gaussian splatting has emerged as a robust technique to represent static scenes and enable high-quality and real-time novel view synthesis. Building upon this technique, we propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse control points and dense Gaussians, respectively. Our key idea is to use sparse control points, significantly fewer in number than the Gaussians, to learn compact 6 DoF transformation bases, which can be locally interpolated through learned interpolation weights to yield the motion field of 3D Gaussians. We employ a deformation MLP to predict time-varying 6 DoF transformations for each control point, which reduces learning complexities, enhances learning abilities, and facilitates obtaining temporal and spatial coherent motion patterns. Then, we jointly learn the 3D Gaussians, the canonical space locations of control points, and the deformation MLP to reconstruct the appearance, geometry, and dynamics of 3D scenes. During learning, the location and number of control points are adaptively adjusted to accommodate varying motion complexities in different regions, and an ARAP loss following the principle of as rigid as possible is developed to enforce spatial continuity and local rigidity of learned motions. Finally, thanks to the explicit sparse motion representation and its decomposition from appearance, our method can enable user-controlled motion editing while retaining high-fidelity appearances. Extensive experiments demonstrate that our approach outperforms existing approaches on novel view synthesis with a high rendering speed and enables novel appearance-preserved motion editing applications. Project page: https://yihua7.github.io/SC-GS-web/
翻译:动态场景的新视角合成仍然是计算机视觉与图形学中的难题。近年来,高斯泼溅作为一种鲁棒技术,能够表示静态场景并实现高质量、实时的视角合成。基于此技术,我们提出一种新表示方法,将动态场景的运动与外观显式解耦为稀疏控制点和稠密高斯函数。核心思想是利用远少于高斯点数量的稀疏控制点学习紧凑的6自由度变换基,通过可学习的插值权重进行局部插值,从而生成3D高斯的运动场。我们采用变形MLP预测每个控制点随时间变化的6自由度变换,这降低了学习复杂度、增强了学习能力,并有助于获得时空一致的运动模式。随后,我们联合学习3D高斯、控制点的规范空间位置以及变形MLP,以重建3D场景的外观、几何与动态。在学习过程中,控制点的位置和数量自适应调整,以适应不同区域变化的运动复杂度,并基于"尽可能刚体"原则提出ARAP损失,以强制执行学习运动的空间连续性与局部刚性。最终,得益于显式稀疏运动表示及其与外观的解耦,我们的方法能够在保留高保真外观的同时实现用户可控的运动编辑。大量实验表明,本方法在高速渲染下超越现有新视角合成方法,并实现了保留外观的创新型运动编辑应用。项目主页:https://yihua7.github.io/SC-GS-web/