Synthesizing photorealistic 4D human head avatars from videos is essential for VR/AR, telepresence, and video game applications. Although existing Neural Radiance Fields (NeRF)-based methods achieve high-fidelity results, the computational expense limits their use in real-time applications. To overcome this limitation, we introduce BakedAvatar, a novel representation for real-time neural head avatar synthesis, deployable in a standard polygon rasterization pipeline. Our approach extracts deformable multi-layer meshes from learned isosurfaces of the head and computes expression-, pose-, and view-dependent appearances that can be baked into static textures for efficient rasterization. We thus propose a three-stage pipeline for neural head avatar synthesis, which includes learning continuous deformation, manifold, and radiance fields, extracting layered meshes and textures, and fine-tuning texture details with differential rasterization. Experimental results demonstrate that our representation generates synthesis results of comparable quality to other state-of-the-art methods while significantly reducing the inference time required. We further showcase various head avatar synthesis results from monocular videos, including view synthesis, face reenactment, expression editing, and pose editing, all at interactive frame rates.
翻译:从视频中合成逼真的4D人类头部化身是VR/AR、远程临场和视频游戏应用的关键技术。现有的基于神经辐射场(NeRF)的方法虽能实现高保真度结果,但其计算成本限制了在实时应用中的使用。为克服这一局限,我们提出BakedAvatar——一种可部署于标准多边形光栅化流水线的实时神经头部化身合成新型表征。该方法从学习得到的人头等值面中提取可变形多层网格,计算依赖表情、姿态和视角的外观特征,并将其烘焙为静态纹理以实现高效光栅化。由此,我们提出三阶段神经头部化身合成流水线:学习连续形变场、流形场和辐射场;提取分层网格与纹理;通过可微分光栅化微调纹理细节。实验结果表明,该表征在生成质量媲美现有最优方法的同时,显著降低推理时间。我们进一步展示了基于单目视频的多项头部化身合成结果,包括视角合成、面部重现、表情编辑及姿态编辑,均能以交互帧率运行。