Neural radiance fields are capable of reconstructing high-quality drivable human avatars but are expensive to train and render. To reduce consumption, we propose Animatable 3D Gaussian, which learns human avatars from input images and poses. We extend 3D Gaussians to dynamic human scenes by modeling a set of skinned 3D Gaussians and a corresponding skeleton in canonical space and deforming 3D Gaussians to posed space according to the input poses. We introduce hash-encoded shape and appearance to speed up training and propose time-dependent ambient occlusion to achieve high-quality reconstructions in scenes containing complex motions and dynamic shadows. On both novel view synthesis and novel pose synthesis tasks, our method outperforms existing methods in terms of training time, rendering speed, and reconstruction quality. Our method can be easily extended to multi-human scenes and achieve comparable novel view synthesis results on a scene with ten people in only 25 seconds of training.
翻译:神经辐射场能够重建高质量可驱动的虚拟形象,但训练和渲染成本高昂。为降低消耗,本文提出可动3D高斯方法,从输入图像和姿态中学习人体虚拟形象。我们通过建模一组蒙皮3D高斯及其在规范空间中的对应骨架,并根据输入姿态将3D高斯变形至姿态空间,从而将3D高斯扩展至动态人体场景。引入哈希编码的形状与外观加速训练,并提出时间相关环境光遮蔽,以在包含复杂运动与动态阴影的场景中实现高质量重建。在新视角合成和新姿态合成任务上,本方法在训练时间、渲染速度与重建质量方面均优于现有方法。本方法可轻松扩展至多人场景,仅需25秒训练即可在包含十人的场景中取得可比的新视角合成结果。