Rendering photo-realistic novel-view images of complex scenes has been a long-standing challenge in computer graphics. In recent years, great research progress has been made on enhancing rendering quality and accelerating rendering speed in the realm of view synthesis. However, when rendering complex dynamic scenes with sparse views, the rendering quality remains limited due to occlusion problems. Besides, for rendering high-resolution images on dynamic scenes, the rendering speed is still far from real-time. In this work, we propose a generalizable view synthesis method that can render high-resolution novel-view images of complex static and dynamic scenes in real-time from sparse views. To address the occlusion problems arising from the sparsity of input views and the complexity of captured scenes, we introduce an explicit 3D visibility reasoning approach that can efficiently estimate the visibility of sampled 3D points to the input views. The proposed visibility reasoning approach is fully differentiable and can gracefully fit inside the volume rendering pipeline, allowing us to train our networks with only multi-view images as supervision while refining geometry and texture simultaneously. Besides, each module in our pipeline is carefully designed to bypass the time-consuming MLP querying process and enhance the rendering quality of high-resolution images, enabling us to render high-resolution novel-view images in real-time.Experimental results show that our method outperforms previous view synthesis methods in both rendering quality and speed, particularly when dealing with complex dynamic scenes with sparse views.
翻译:实现复杂场景的照片级真实感新视角图像渲染一直是计算机图形学领域的长期挑战。近年来,视角合成在提升渲染质量和加速渲染速度方面取得了显著研究进展。然而,当使用稀疏视角渲染复杂动态场景时,由于遮挡问题,渲染质量仍受到限制。此外,在动态场景中渲染高分辨率图像时,渲染速度仍远未达到实时要求。本研究提出了一种可泛化的视角合成方法,能够从稀疏视角实时渲染复杂静态和动态场景的高分辨率新视角图像。为解决输入视角稀疏性和被捕捉场景复杂性导致的遮挡问题,我们引入了一种显式三维可见性推理方法,可高效估算采样三维点相对于输入视角的可见性。所提出的可见性推理方法完全可微,并能优雅地融入体渲染管线,使我们仅以多视角图像作为监督即可训练网络,同时精细化几何与纹理。此外,我们管线中的每个模块均经过精心设计,以规避耗时的MLP查询过程,并提升高分辨率图像的渲染质量,从而能够实时渲染高分辨率新视角图像。实验结果表明,本方法在渲染质量和速度上均优于以往的视角合成方法,尤其适用于处理稀疏视角下的复杂动态场景。