Current personalized neural head avatars face a trade-off: lightweight models lack detail and realism, while high-quality, animatable avatars require significant computational resources, making them unsuitable for commodity devices. To address this gap, we introduce Gaussian Eigen Models (GEM), which provide high-quality, lightweight, and easily controllable head avatars. GEM utilizes 3D Gaussian primitives for representing the appearance combined with Gaussian splatting for rendering. Building on the success of mesh-based 3D morphable face models (3DMM), we define GEM as an ensemble of linear eigenbases for representing the head appearance of a specific subject. In particular, we construct linear bases to represent the position, scale, rotation, and opacity of the 3D Gaussians. This allows us to efficiently generate Gaussian primitives of a specific head shape by a linear combination of the basis vectors, only requiring a low-dimensional parameter vector that contains the respective coefficients. We propose to construct these linear bases (GEM) by distilling high-quality compute-intense CNN-based Gaussian avatar models that can generate expression-dependent appearance changes like wrinkles. These high-quality models are trained on multi-view videos of a subject and are distilled using a series of principal component analyses. Once we have obtained the bases that represent the animatable appearance space of a specific human, we learn a regressor that takes a single RGB image as input and predicts the low-dimensional parameter vector that corresponds to the shown facial expression. In a series of experiments, we compare GEM's self-reenactment and cross-person reenactment results to state-of-the-art 3D avatar methods, demonstrating GEM's higher visual quality and better generalization to new expressions.
翻译:当前个性化神经头部化身面临一个权衡:轻量级模型缺乏细节和真实感,而高质量、可动画化的化身需要大量计算资源,使其难以在消费级设备上运行。为弥补这一差距,我们提出了高斯特征模型(GEM),该模型能够提供高质量、轻量级且易于控制的头部化身。GEM采用3D高斯基元表示外观,并结合高斯泼溅进行渲染。基于基于网格的3D可变形人脸模型(3DMM)的成功经验,我们将GEM定义为用于表示特定对象头部外观的线性特征基集合。具体而言,我们构建线性基来表示3D高斯的位置、尺度、旋转和透明度。这使得我们能够通过基向量的线性组合高效生成特定头部形状的高斯基元,仅需包含相应系数的低维参数向量。我们提出通过蒸馏高质量、计算密集的基于CNN的高斯化身模型来构建这些线性基(GEM),这些模型能够生成依赖表情的外观变化(如皱纹)。这些高质量模型在对象的多视角视频上进行训练,并通过一系列主成分分析进行蒸馏。一旦获得表示特定人类可动画化外观空间的基,我们学习一个回归器,该回归器以单张RGB图像作为输入,并预测与所示面部表情对应的低维参数向量。在一系列实验中,我们将GEM的自重演和跨人重演结果与最先进的3D化身方法进行比较,证明了GEM具有更高的视觉质量和更好的新表情泛化能力。