Recent advances in full-head reconstruction have been obtained by optimizing a neural field through differentiable surface or volume rendering to represent a single scene. While these techniques achieve an unprecedented accuracy, they take several minutes, or even hours, due to the expensive optimization process required. In this work, we introduce InstantAvatar, a method that recovers full-head avatars from few images (down to just one) in a few seconds on commodity hardware. In order to speed up the reconstruction process, we propose a system that combines, for the first time, a voxel-grid neural field representation with a surface renderer. Notably, a naive combination of these two techniques leads to unstable optimizations that do not converge to valid solutions. In order to overcome this limitation, we present a novel statistical model that learns a prior distribution over 3D head signed distance functions using a voxel-grid based architecture. The use of this prior model, in combination with other design choices, results into a system that achieves 3D head reconstructions with comparable accuracy as the state-of-the-art with a 100x speed-up.
翻译:近年来,通过可微表面或体积渲染优化神经场以表示单一场景,全头重建取得了显著进展。尽管这些技术达到了前所未有的精度,但由于所需的昂贵优化过程,仍需数分钟甚至数小时才能完成。本文提出InstantAvatar方法,该方法能在普通硬件上仅凭少数图像(最少一张)在数秒内恢复全头虚拟形象。为加速重建过程,我们首次提出结合体素网格神经场表示与表面渲染器的系统。值得注意的是,这两种技术的简单组合会导致优化不稳定且无法收敛到有效解。为克服该局限,我们提出一种新颖的统计模型,该模型利用基于体素网格的架构学习三维头部符号距离函数的先验分布。该先验模型与其它设计选择相结合,使得系统在实现与现有技术相当的三维头部重建精度的同时,速度提升达100倍。