We present InstantGeoAvatar, a method for efficient and effective learning from monocular video of detailed 3D geometry and appearance of animatable implicit human avatars. Our key observation is that the optimization of a hash grid encoding to represent a signed distance function (SDF) of the human subject is fraught with instabilities and bad local minima. We thus propose a principled geometry-aware SDF regularization scheme that seamlessly fits into the volume rendering pipeline and adds negligible computational overhead. Our regularization scheme significantly outperforms previous approaches for training SDFs on hash grids. We obtain competitive results in geometry reconstruction and novel view synthesis in as little as five minutes of training time, a significant reduction from the several hours required by previous work. InstantGeoAvatar represents a significant leap forward towards achieving interactive reconstruction of virtual avatars.
翻译:本文提出InstantGeoAvatar方法,能够从单目视频中高效学习可动画化隐式人体虚拟人的精细三维几何与外观表示。我们的核心发现是:采用哈希网格编码优化人体符号距离函数(SDF)表示时,存在严重的不稳定性与局部极小值问题。为此,我们提出一种基于几何先验的SDF正则化方案,该方案可无缝集成于体渲染流程且计算开销可忽略不计。实验表明,我们的正则化方案在哈希网格SDF训练任务上显著优于现有方法。仅需五分钟训练时间即可获得具有竞争力的几何重建与新视角合成效果,较前人工作所需的数小时训练时长实现了数量级提升。InstantGeoAvatar标志着向交互式虚拟人重建目标迈出了重要一步。