This paper addresses the challenge of reconstructing an animatable human model from a multi-view video. Some recent works have proposed to decompose a non-rigidly deforming scene into a canonical neural radiance field and a set of deformation fields that map observation-space points to the canonical space, thereby enabling them to learn the dynamic scene from images. However, they represent the deformation field as translational vector field or SE(3) field, which makes the optimization highly under-constrained. Moreover, these representations cannot be explicitly controlled by input motions. Instead, we introduce a pose-driven deformation field based on the linear blend skinning algorithm, which combines the blend weight field and the 3D human skeleton to produce observation-to-canonical correspondences. Since 3D human skeletons are more observable, they can regularize the learning of the deformation field. Moreover, the pose-driven deformation field can be controlled by input skeletal motions to generate new deformation fields to animate the canonical human model. Experiments show that our approach significantly outperforms recent human modeling methods. The code is available at https://zju3dv.github.io/animatable_nerf/.
翻译:本文旨在解决从多视角视频中重建可动画化人体模型的挑战。近年来,部分研究提出将非刚性形变场景分解为标准神经辐射场与一组将观测空间点映射至标准空间的形变场,从而能够从图像中学习动态场景。然而,这些方法将形变场表示为平移向量场或SE(3)场,导致优化过程高度欠约束。此外,此类表示无法通过输入运动进行显式控制。为此,本文提出一种基于线性混合蒙皮算法的姿态驱动形变场,通过融合蒙皮权重场与三维人体骨骼生成观测-标准空间对应关系。由于三维人体骨骼具有更高可观测性,能够约束形变场的优化学习。同时,该姿态驱动形变场可通过输入骨骼运动进行控制,生成新的形变场以驱动标准人体模型完成动画化。实验表明,本方法显著优于近期人体建模方法。相关代码已开源:https://zju3dv.github.io/animatable_nerf/