Recent advances in full-head reconstruction have been obtained by optimizing a neural field through differentiable surface or volume rendering to represent a single scene. While these techniques achieve an unprecedented accuracy, they take several minutes, or even hours, due to the expensive optimization process required. In this work, we introduce InstantAvatar, a method that recovers full-head avatars from few images (down to just one) in a few seconds on commodity hardware. In order to speed up the reconstruction process, we propose a system that combines, for the first time, a voxel-grid neural field representation with a surface renderer. Notably, a naive combination of these two techniques leads to unstable optimizations that do not converge to valid solutions. In order to overcome this limitation, we present a novel statistical model that learns a prior distribution over 3D head signed distance functions using a voxel-grid based architecture. The use of this prior model, in combination with other design choices, results into a system that achieves 3D head reconstructions with comparable accuracy as the state-of-the-art with a 100x speed-up.
翻译:近年来,通过可微表面或体积渲染优化神经场以重建单一场景的方法在全头重建领域取得了显著进展。尽管这些技术实现了前所未有的精度,但由于需要昂贵的优化过程,其耗时往往长达数分钟甚至数小时。本文提出InstantAvatar方法,能够在普通硬件上仅凭少量图像(最少一张)在数秒内完成全头虚拟形象重建。为加速重建过程,我们首次提出将体素网格神经场表示与表面渲染器相结合的系统。值得注意的是,这两种技术的简单组合会导致优化不稳定,无法收敛到有效解。为克服这一局限,我们提出一种新型统计模型,该模型通过基于体素网格的架构学习三维头部符号距离函数的先验分布。结合其他设计策略,这一先验模型的应用使得系统能够以相当于当前最先进方法的精度完成三维头部重建,同时实现100倍的加速比。