Synthesizing photorealistic 4D human head avatars from videos is essential for VR/AR, telepresence, and video game applications. Although existing Neural Radiance Fields (NeRF)-based methods achieve high-fidelity results, the computational expense limits their use in real-time applications. To overcome this limitation, we introduce BakedAvatar, a novel representation for real-time neural head avatar synthesis, deployable in a standard polygon rasterization pipeline. Our approach extracts deformable multi-layer meshes from learned isosurfaces of the head and computes expression-, pose-, and view-dependent appearances that can be baked into static textures for efficient rasterization. We thus propose a three-stage pipeline for neural head avatar synthesis, which includes learning continuous deformation, manifold, and radiance fields, extracting layered meshes and textures, and fine-tuning texture details with differential rasterization. Experimental results demonstrate that our representation generates synthesis results of comparable quality to other state-of-the-art methods while significantly reducing the inference time required. We further showcase various head avatar synthesis results from monocular videos, including view synthesis, face reenactment, expression editing, and pose editing, all at interactive frame rates.
翻译:摘要:从视频中合成照片级逼真的4D人类头部化身对于VR/AR、远程呈现及视频游戏应用至关重要。尽管现有基于神经辐射场(NeRF)的方法已实现高保真结果,但其计算开销限制了实时应用场景。为突破这一限制,我们提出BakedAvatar——一种可部署于标准多边形光栅化管线的实时神经头部化身合成新表示。该方法从学习到的头部等值面中提取可变形的多层网格,并计算依赖表情、姿态和视角的外观特征,这些特征可被烘焙为静态纹理以实现高效光栅化。由此,我们提出三阶段神经头部化身合成管线:学习连续形变场、流形场与辐射场;提取分层网格与纹理;通过微分光栅化精调纹理细节。实验表明,该表示方法在保持与现有最优方法相当合成质量的同时,显著降低了推理时间。我们还展示了从单目视频生成的多样化头部化身结果,包括视角合成、面部重现、表情编辑与姿态编辑,均可实现交互级帧率。