HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars

We introduce HyperGaussians, a novel extension of 3D Gaussian Splatting for high-quality animatable face avatars. Creating such detailed face avatars from videos is a challenging problem and has numerous applications in augmented and virtual reality. While tremendous successes have been achieved for static faces, animatable avatars from monocular videos still fall in the uncanny valley. The de facto standard, 3D Gaussian Splatting (3DGS), represents a face through a collection of 3D Gaussian primitives. 3DGS excels at rendering static faces, but the state-of-the-art still struggles with nonlinear deformations, complex lighting effects, and fine details. While most related works focus on predicting better Gaussian parameters from expression codes, we rethink the 3D Gaussian representation itself and how to make it more expressive. Our insights lead to a novel extension of 3D Gaussians to high-dimensional multivariate Gaussians, dubbed 'HyperGaussians'. The higher dimensionality increases expressivity through conditioning on a learnable local embedding. However, splatting HyperGaussians is computationally expensive because it requires inverting a high-dimensional covariance matrix. We solve this by reparameterizing the covariance matrix, dubbed the 'inverse covariance trick'. This trick boosts the efficiency so that HyperGaussians can be seamlessly integrated into existing models. To demonstrate this, we plug in HyperGaussians into the state-of-the-art in fast monocular face avatars: FlashAvatar. Our evaluation on 19 subjects from 4 face datasets shows that HyperGaussians outperform 3DGS numerically and visually, particularly for high-frequency details like eyeglass frames, teeth, complex facial movements, and specular reflections.

翻译：我们引入HyperGaussians，一种面向高质量可动画人脸化身的3D高斯溅射新扩展。从视频创建如此精细的人脸化身是一个具有挑战性的问题，在增强现实和虚拟现实中具有众多应用。尽管静态人脸已取得巨大成功，但基于单目视频的可动画化身仍处于恐怖谷效应之中。事实上的标准方法——3D高斯溅射（3DGS）通过一组3D高斯基元表示人脸。3DGS擅长渲染静态人脸，但现有技术在非线性形变、复杂光照效果和精细细节方面仍面临挑战。不同于大多数相关工作专注于从表情编码预测更优的高斯参数，我们重新思考3D高斯表示本身及其如何增强表达能力。我们的洞察将3D高斯扩展至高维多元高斯，称为"HyperGaussians"。通过基于可学习局部嵌入的条件约束，高维度提升了表达能力。然而，溅射HyperGaussians因需计算高维协方差矩阵的逆而计算代价高昂。我们通过重新参数化协方差矩阵解决了这一问题，称为"逆协方差技巧"。该技巧提升了效率，使HyperGaussians能无缝集成至现有模型。为验证此方法，我们将HyperGaussians嵌入快速单目人脸化身领域最先进的FlashAvatar系统。在来自4个人脸数据集的19个受试者上的评估表明，HyperGaussians在数值指标和视觉质量上均优于3DGS，尤其体现在眼镜框、牙齿、复杂面部运动及镜面反射等高频率细节上。