Creating controllable, photorealistic, and geometrically detailed digital doubles of real humans solely from video data is a key challenge in Computer Graphics and Vision, especially when real-time performance is required. Recent methods attach a neural radiance field (NeRF) to an articulated structure, e.g., a body model or a skeleton, to map points into a pose canonical space while conditioning the NeRF on the skeletal pose. These approaches typically parameterize the neural field with a multi-layer perceptron (MLP) leading to a slow runtime. To address this drawback, we propose TriHuman a novel human-tailored, deformable, and efficient tri-plane representation, which achieves real-time performance, state-of-the-art pose-controllable geometry synthesis as well as photorealistic rendering quality. At the core, we non-rigidly warp global ray samples into our undeformed tri-plane texture space, which effectively addresses the problem of global points being mapped to the same tri-plane locations. We then show how such a tri-plane feature representation can be conditioned on the skeletal motion to account for dynamic appearance and geometry changes. Our results demonstrate a clear step towards higher quality in terms of geometry and appearance modeling of humans as well as runtime performance.
翻译:从视频数据中创建可控、逼真且几何细节丰富的真实数字替身是计算机图形学与视觉领域的关键挑战,尤其在需要实时性能的场景中。现有方法将神经辐射场(NeRF)附着到可动结构(如人体模型或骨骼)上,将点映射到姿态规范空间,同时以骨骼姿态作为NeRF的条件。这些方法通常使用多层感知机(MLP)参数化神经场,导致运行速度较慢。为解决此缺陷,我们提出TriHuman——一种面向人体定制、可变形且高效的三平面表示,实现了实时性能、最先进的姿态可控几何合成及逼真渲染质量。其核心在于,我们将全局光线样本非刚性地扭曲到未变形的三平面纹理空间,有效解决了全局点被映射至相同三平面位置的问题。随后,我们展示了如何以骨骼运动作为这种三平面特征表示的条件,从而解释动态外观与几何变化。实验结果证明,本方法在人体几何与外观建模质量以及运行性能方面均实现了显著提升。