In this paper, we present the decomposed triplane-hash neural radiance fields (DT-NeRF), a framework that significantly improves the photorealistic rendering of talking faces and achieves state-of-the-art results on key evaluation datasets. Our architecture decomposes the facial region into two specialized triplanes: one specialized for representing the mouth, and the other for the broader facial features. We introduce audio features as residual terms and integrate them as query vectors into our model through an audio-mouth-face transformer. Additionally, our method leverages the capabilities of Neural Radiance Fields (NeRF) to enrich the volumetric representation of the entire face through additive volumetric rendering techniques. Comprehensive experimental evaluations corroborate the effectiveness and superiority of our proposed approach.
翻译:本文提出了分解式三平面-哈希神经辐射场(DT-NeRF)框架,该框架显著提升了说话人面部照片级渲染效果,并在关键评估数据集上取得了最先进的结果。我们的架构将面部区域分解为两个专用三平面:一个专门用于表示嘴部区域,另一个则用于表征更广泛的面部特征。我们引入音频特征作为残差项,并通过音频-嘴部-面部变换器将其作为查询向量集成到模型中。此外,本方法利用神经辐射场(NeRF)的能力,通过加性体渲染技术增强整个面部的体素表征。全面的实验评估验证了我们所提出方法的有效性及优越性。