Radiance fields have demonstrated impressive performance in synthesizing lifelike 3D talking heads. However, due to the difficulty in fitting steep appearance changes, the prevailing paradigm that presents facial motions by directly modifying point appearance may lead to distortions in dynamic regions. To tackle this challenge, we introduce TalkingGaussian, a deformation-based radiance fields framework for high-fidelity talking head synthesis. Leveraging the point-based Gaussian Splatting, facial motions can be represented in our method by applying smooth and continuous deformations to persistent Gaussian primitives, without requiring to learn the difficult appearance change like previous methods. Due to this simplification, precise facial motions can be synthesized while keeping a highly intact facial feature. Under such a deformation paradigm, we further identify a face-mouth motion inconsistency that would affect the learning of detailed speaking motions. To address this conflict, we decompose the model into two branches separately for the face and inside mouth areas, therefore simplifying the learning tasks to help reconstruct more accurate motion and structure of the mouth region. Extensive experiments demonstrate that our method renders high-quality lip-synchronized talking head videos, with better facial fidelity and higher efficiency compared with previous methods.
翻译:辐射场在合成逼真的三维说话头方面展现了卓越性能。然而,由于难以拟合陡峭的外观变化,现有通过直接修改点外观来呈现面部运动的范式可能导致动态区域产生畸变。为解决这一挑战,我们提出TalkingGaussian——一种基于形变的辐射场框架,用于高保真说话头合成。借助基于点的高斯溅射技术,我们的方法可通过将平滑且连续的形变应用于持久的高斯基元来表示面部运动,无需像先前方法那样学习困难的外观变化。得益于这种简化,我们能够在保持高度完整面部特征的同时合成精确的面部运动。在此形变范式下,我们进一步发现了一种会影响精细说话动作学习的面部-口腔运动不一致性。为解决这一矛盾,我们将模型分解为分别处理面部和口腔内部区域的两个分支,从而简化学习任务以帮助重建更精确的口腔区域运动与结构。大量实验表明,与先前方法相比,我们的方法能渲染出高质量唇形同步的说话头视频,并展现出更优的面部保真度和更高的效率。