Neural volume rendering enables photo-realistic renderings of a human performer in free-view, a critical task in immersive VR/AR applications. But the practice is severely limited by high computational costs in the rendering process. To solve this problem, we propose the UV Volumes, a new approach that can render an editable free-view video of a human performer in real-time. It separates the high-frequency (i.e., non-smooth) human appearance from the 3D volume, and encodes them into 2D neural texture stacks (NTS). The smooth UV volumes allow much smaller and shallower neural networks to obtain densities and texture coordinates in 3D while capturing detailed appearance in 2D NTS. For editability, the mapping between the parameterized human model and the smooth texture coordinates allows us a better generalization on novel poses and shapes. Furthermore, the use of NTS enables interesting applications, e.g., retexturing. Extensive experiments on CMU Panoptic, ZJU Mocap, and H36M datasets show that our model can render 960 x 540 images in 30FPS on average with comparable photo-realism to state-of-the-art methods. The project and supplementary materials are available at https://fanegg.github.io/UV-Volumes.
翻译:神经体渲染技术能够实现自由视角下人体表演者的逼真渲染,这是沉浸式VR/AR应用中的关键任务。但该技术因渲染过程中高昂的计算成本而受到严重限制。为解决这一问题,我们提出UV Volumes——一种能实时渲染可编辑自由视角人体表演视频的新方法。该方法将高频(即非平滑)人体外观从三维体素中分离,并将其编码为二维神经纹理堆栈。平滑的UV Volumes允许使用更小更浅的神经网络获取三维空间的密度和纹理坐标,同时在二维神经纹理堆栈中捕获细节外观。在可编辑性方面,参数化人体模型与平滑纹理坐标之间的映射使我们在新姿态和形状上获得更好的泛化能力。此外,神经纹理堆栈的使用还实现了如纹理替换等有趣的应用。在CMU Panoptic、ZJU Mocap和H36M数据集上的大量实验表明,我们的模型能以平均30FPS的帧率渲染960×540分辨率的图像,同时达到与最先进方法相当的逼真度。项目及补充材料见https://fanegg.github.io/UV-Volumes。