Constructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advancements achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time rendering. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specifically, we utilize 3D Gaussians (3DGs) to represent the scene. Instead of the na\"ive approach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strategy to handle emerging objects in dynamic scenes. Experiments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods.
翻译:从多视角视频构建动态场景的逼真自由视点视频仍然是一项具有挑战性的任务。尽管当前神经渲染技术取得了显著进展,但这些方法通常需要完整的视频序列进行离线训练,且无法实现实时渲染。为突破这些限制,我们提出了3DGStream——一种专为真实世界动态场景高效FVV流式传输设计的方法。本方法实现了单帧12秒内的快速实时重建与200 FPS的实时渲染。具体而言,我们采用3D高斯表征场景。区别于逐帧直接优化3D高斯的传统方案,我们引入紧凑的神经变换缓存来建模3D高斯的平移与旋转,显著降低了单帧FVV所需的训练时间与存储开销。此外,我们提出自适应3D高斯增补策略以处理动态场景中新出现的物体。实验表明,相较于前沿方法,3DGStream在渲染速度、图像质量、训练时间与模型存储方面均展现出竞争优势。