Recent advancements in 3D Gaussian Splatting (3DGS) have unlocked significant potential for modeling 3D head avatars, providing greater flexibility than mesh-based methods and more efficient rendering compared to NeRF-based approaches. Despite these advancements, the creation of controllable 3DGS-based head avatars remains time-intensive, often requiring tens of minutes to hours. To expedite this process, we here introduce the "Gaussian Deja-vu" framework, which first obtains a generalized model of the head avatar and then personalizes the result. The generalized model is trained on large 2D (synthetic and real) image datasets. This model provides a well-initialized 3D Gaussian head that is further refined using a monocular video to achieve the personalized head avatar. For personalizing, we propose learnable expression-aware rectification blendmaps to correct the initial 3D Gaussians, ensuring rapid convergence without the reliance on neural networks. Experiments demonstrate that the proposed method meets its objectives. It outperforms state-of-the-art 3D Gaussian head avatars in terms of photorealistic quality as well as reduces training time consumption to at least a quarter of the existing methods, producing the avatar in minutes.
翻译:近年来,三维高斯溅射(3DGS)技术的进展为三维头部化身建模释放了巨大潜力,相比基于网格的方法提供了更高的灵活性,相较于基于神经辐射场(NeRF)的方法则实现了更高效的渲染。尽管取得了这些进展,基于3DGS的可控头部化身的创建过程仍十分耗时,通常需要数十分钟至数小时。为加速这一过程,本文提出“高斯既视感”框架,该框架首先获取头部化身的泛化模型,随后对结果进行个性化定制。泛化模型通过在大规模二维(合成与真实)图像数据集上进行训练获得。该模型提供了一个经过良好初始化的三维高斯头部表示,可进一步利用单目视频进行精细化调整,以实现个性化头部化身。在个性化阶段,我们提出可学习的表情感知校正混合图,用于修正初始三维高斯分布,确保在不依赖神经网络的情况下实现快速收敛。实验结果表明,所提方法达到了预期目标。它在照片级真实感质量方面优于当前最先进的三维高斯头部化身方法,同时将训练耗时减少至现有方法的至少四分之一,能够在数分钟内生成化身。