Rendering 3D human appearance from a single image in real-time is crucial for achieving holographic communication and immersive VR/AR. Existing methods either rely on multi-camera setups or are constrained to offline operations. In this paper, we propose R2Human, the first approach for real-time inference and rendering of photorealistic 3D human appearance from a single image. The core of our approach is to combine the strengths of implicit texture fields and explicit neural rendering with our novel representation, namely Z-map. Based on this, we present an end-to-end network that performs high-fidelity color reconstruction of visible areas and provides reliable color inference for occluded regions. To further enhance the 3D perception ability of our network, we leverage the Fourier occupancy field as a prior for generating the texture field and providing a sampling surface in the rendering stage. We also propose a consistency loss and a spatial fusion strategy to ensure the multi-view coherence. Experimental results show that our method outperforms the state-of-the-art methods on both synthetic data and challenging real-world images, in real-time. The project page can be found at http://cic.tju.edu.cn/faculty/likun/projects/R2Human.
翻译:从单张图像实时渲染三维人体外观对于实现全息通信与沉浸式VR/AR至关重要。现有方法要么依赖多相机系统,要么仅限于离线操作。本文提出R2Human,首个从单张图像实现实时推理与照片级真实感三维人体外观渲染的方法。该方法的核心是通过我们提出的新型表征——Z-map,将隐式纹理场与显式神经渲染的优势相结合。在此基础上,我们构建了一个端到端网络,能够对可见区域进行高保真色彩重建,并对遮挡区域提供可靠的颜色推断。为增强网络的三维感知能力,我们利用傅里叶占据场作为先验信息来生成纹理场,并在渲染阶段提供采样表面。此外,我们提出一致性损失函数与空间融合策略以确保多视角连贯性。实验结果表明,在合成数据与具有挑战性的真实图像上,我们的方法在实时性方面均优于现有先进技术。项目页面详见:http://cic.tju.edu.cn/faculty/likun/projects/R2Human。