Existing approaches to animatable NeRF-based head avatars are either built upon face templates or use the expression coefficients of templates as the driving signal. Despite the promising progress, their performances are heavily bound by the expression power and the tracking accuracy of the templates. In this work, we present LatentAvatar, an expressive neural head avatar driven by latent expression codes. Such latent expression codes are learned in an end-to-end and self-supervised manner without templates, enabling our method to get rid of expression and tracking issues. To achieve this, we leverage a latent head NeRF to learn the person-specific latent expression codes from a monocular portrait video, and further design a Y-shaped network to learn the shared latent expression codes of different subjects for cross-identity reenactment. By optimizing the photometric reconstruction objectives in NeRF, the latent expression codes are learned to be 3D-aware while faithfully capturing the high-frequency detailed expressions. Moreover, by learning a mapping between the latent expression code learned in shared and person-specific settings, LatentAvatar is able to perform expressive reenactment between different subjects. Experimental results show that our LatentAvatar is able to capture challenging expressions and the subtle movement of teeth and even eyeballs, which outperforms previous state-of-the-art solutions in both quantitative and qualitative comparisons. Project page: https://www.liuyebin.com/latentavatar.
翻译:现有的基于NeRF的可动画头部化身方法,要么基于面部模板,要么使用模板的表情系数作为驱动信号。尽管取得了进展,但其性能严重受限于模板的表达能力与追踪精度。本文提出LatentAvatar——一种由潜在表情编码驱动的富有表现力的神经头部化身。这种潜在表情编码以端到端且自监督的方式学习,无需模板,从而使方法摆脱表情与追踪问题。为此,我们利用潜在头部NeRF从单目肖像视频中学习个体特定的潜在表情编码,并进一步设计Y形网络学习不同个体的共享潜在表情编码,以实现跨身份重演。通过优化NeRF中的光度重建目标,潜在表情编码被学习为具有三维感知能力,同时忠实捕捉高频细节表情。此外,通过学习共享设置与个体特定设置下潜在表情编码间的映射,LatentAvatar能够实现不同个体间富有表现力的重演。实验结果表明,LatentAvatar能捕捉具有挑战性的表情以及牙齿乃至眼球的细微运动,在定量与定性比较中均超越先前最先进方案。项目页面:https://www.liuyebin.com/latentavatar。