This paper introduces a novel representation of volumetric videos for real-time view synthesis of dynamic scenes. Recent advances in neural scene representations demonstrate their remarkable capability to model and render complex static scenes, but extending them to represent dynamic scenes is not straightforward due to their slow rendering speed or high storage cost. To solve this problem, our key idea is to represent the radiance field of each frame as a set of shallow MLP networks whose parameters are stored in 2D grids, called MLP maps, and dynamically predicted by a 2D CNN decoder shared by all frames. Representing 3D scenes with shallow MLPs significantly improves the rendering speed, while dynamically predicting MLP parameters with a shared 2D CNN instead of explicitly storing them leads to low storage cost. Experiments show that the proposed approach achieves state-of-the-art rendering quality on the NHR and ZJU-MoCap datasets, while being efficient for real-time rendering with a speed of 41.7 fps for $512 \times 512$ images on an RTX 3090 GPU. The code is available at https://zju3dv.github.io/mlp_maps/.
翻译:本文提出一种用于动态场景实时视图合成的体视频新型表示方法。近期神经场景表示的研究展示了其在建模与渲染复杂静态场景方面的卓越能力,但将其扩展到动态场景时,由于渲染速度慢或存储成本高而并非易事。为解决此问题,我们的核心思想是将每一帧的辐射场表示为一组浅层MLP网络,其参数存储于二维网格(称为MLP地图)中,并通过所有帧共享的二维CNN解码器动态预测。使用浅层MLP表示三维场景显著提升了渲染速度,而通过共享二维CNN动态预测MLP参数而非显式存储它们,则降低了存储成本。实验表明,所提方法在NHR和ZJU-MoCap数据集上实现了最先进的渲染质量,同时在RTX 3090 GPU上以41.7帧/秒的速度渲染512×512图像,达到实时渲染效率。代码开源于https://zju3dv.github.io/mlp_maps/。