3D head avatars built with neural implicit volumetric representations have achieved unprecedented levels of photorealism. However, the computational cost of these methods remains a significant barrier to their widespread adoption, particularly in real-time applications such as virtual reality and teleconferencing. While attempts have been made to develop fast neural rendering approaches for static scenes, these methods cannot be simply employed to support realistic facial expressions, such as in the case of a dynamic facial performance. To address these challenges, we propose a novel fast 3D neural implicit head avatar model that achieves real-time rendering while maintaining fine-grained controllability and high rendering quality. Our key idea lies in the introduction of local hash table blendshapes, which are learned and attached to the vertices of an underlying face parametric model. These per-vertex hash-tables are linearly merged with weights predicted via a CNN, resulting in expression dependent embeddings. Our novel representation enables efficient density and color predictions using a lightweight MLP, which is further accelerated by a hierarchical nearest neighbor search method. Extensive experiments show that our approach runs in real-time while achieving comparable rendering quality to state-of-the-arts and decent results on challenging expressions.
翻译:利用神经隐式体积表示构建的3D头部化身在逼真度上达到了前所未有的水平。然而,这些方法的计算成本仍然是其广泛普及的主要障碍,特别是在虚拟现实和远程会议等实时应用中。尽管已有研究尝试开发针对静态场景的快速神经渲染方法,但这些方法无法直接用于支持动态面部表情等逼真面部情感的呈现。为解决这些挑战,我们提出了一种新颖的快速3D神经隐式头部化身模型,该模型在保持细粒度可控性和高渲染质量的同时实现了实时渲染。我们的核心思想在于引入局部哈希表混合形状,这些混合形状通过学习附着于底层面部参数模型的顶点上。这些逐顶点的哈希表通过CNN预测的权重进行线性融合,生成依赖表情的嵌入表示。这种新型表示方法使得轻量级MLP能够高效预测密度和颜色,并通过分层最近邻搜索方法进一步加速。大量实验表明,我们的方法在实时运行的同时,渲染质量可与现有最先进技术媲美,并能出色地处理挑战性表情。