Neural radiance fields are capable of reconstructing high-quality drivable human avatars but are expensive to train and render and not suitable for multi-human scenes with complex shadows. To reduce consumption, we propose Animatable 3D Gaussian, which learns human avatars from input images and poses. We extend 3D Gaussians to dynamic human scenes by modeling a set of skinned 3D Gaussians and a corresponding skeleton in canonical space and deforming 3D Gaussians to posed space according to the input poses. We introduce a multi-head hash encoder for pose-dependent shape and appearance and a time-dependent ambient occlusion module to achieve high-quality reconstructions in scenes containing complex motions and dynamic shadows. On both novel view synthesis and novel pose synthesis tasks, our method achieves higher reconstruction quality than InstantAvatar with less training time (1/60), less GPU memory (1/4), and faster rendering speed (7x). Our method can be easily extended to multi-human scenes and achieve comparable novel view synthesis results on a scene with ten people in only 25 seconds of training.
翻译:神经辐射场能够重建高质量的可驱动人体数字形象,但训练与渲染成本高昂,且不适用于包含复杂阴影的多人体场景。为降低资源消耗,我们提出可驱动三维高斯方法,该方法从输入图像与姿态中学习人体数字形象。我们通过建模规范空间中的一组蒙皮三维高斯及其对应骨架,并根据输入姿态将三维高斯变形至姿态空间,从而将三维高斯扩展至动态人体场景。我们引入了用于姿态相关形状与外观的多头哈希编码器,以及时间相关的环境光遮蔽模块,以在包含复杂运动与动态阴影的场景中实现高质量重建。在新型视角合成与新型姿态合成任务中,本方法在更短的训练时间(1/60)、更低的GPU内存占用(1/4)及更快的渲染速度(7倍)条件下,实现了比InstantAvatar更高的重建质量。本方法可轻松扩展至多人体场景,仅需25秒训练即可在十人场景中获得可比的新型视角合成结果。