Constructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advancements achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time rendering. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specifically, we utilize 3D Gaussians (3DGs) to represent the scene. Instead of the na\"ive approach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strategy to handle emerging objects in dynamic scenes. Experiments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods.
翻译:从多视角视频构建动态场景的照片级逼真自由视角视频(FVV)仍是一项具有挑战性的任务。尽管当前神经渲染技术取得了显著进展,但这些方法通常需要完整的视频序列进行离线训练,且无法实现实时渲染。为解决这些限制,我们提出了3DGStream——一种面向真实动态场景高效FVV流式传输的方法。该方法可在12秒内完成逐帧快速在线重建,并以200 FPS的速率实现实时渲染。具体而言,我们采用3D高斯(3DGs)表示场景。不同于直接逐帧优化3DGs的朴素方法,我们采用紧凑型神经变换缓存(NTC)对3DGs的平移和旋转进行建模,显著减少了每帧FVV所需的训练时间和存储空间。此外,我们提出了一种自适应3DG添加策略,用于处理动态场景中新出现的物体。实验表明,与现有最先进方法相比,3DGStream在渲染速度、图像质量、训练时间和模型存储方面均取得了具有竞争力的性能。