Forecasting future scenarios in dynamic environments is essential for intelligent decision-making and navigation, a challenge yet to be fully realized in computer vision and robotics. Traditional approaches like video prediction and novel-view synthesis either lack the ability to forecast from arbitrary viewpoints or to predict temporal dynamics. In this paper, we introduce GaussianPrediction, a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis in dynamic environments. GaussianPrediction can forecast future states from any viewpoint, using video observations of dynamic scenes. To this end, we first propose a 3D Gaussian canonical space with deformation modeling to capture the appearance and geometry of dynamic scenes, and integrate the lifecycle property into Gaussians for irreversible deformations. To make the prediction feasible and efficient, a concentric motion distillation approach is developed by distilling the scene motion with key points. Finally, a Graph Convolutional Network is employed to predict the motions of key points, enabling the rendering of photorealistic images of future scenarios. Our framework shows outstanding performance on both synthetic and real-world datasets, demonstrating its efficacy in predicting and rendering future environments.
翻译:在动态环境中预测未来场景对于智能决策与导航至关重要,这一挑战在计算机视觉与机器人领域尚未完全实现。传统方法如视频预测和新视角合成要么缺乏从任意视点进行预测的能力,要么无法预测时间动态。本文提出高斯预测,一种新颖的框架,赋予三维高斯表示在动态环境中进行动态场景建模与未来场景合成的能力。高斯预测能够利用动态场景的视频观测,从任意视点预测未来状态。为此,我们首先提出一种带有形变建模的三维高斯规范空间,以捕捉动态场景的外观与几何结构,并将生命周期属性集成到高斯表示中以处理不可逆变。为使预测可行且高效,我们通过关键点提取场景运动,开发了一种同心运动蒸馏方法。最后,采用图卷积网络预测关键点的运动,从而实现对未来场景的光照真实图像渲染。我们的框架在合成与真实世界数据集上均表现出卓越性能,证明了其在预测与渲染未来环境方面的有效性。