High-fidelity human 3D models can now be learned directly from videos, typically by combining a template-based surface model with neural representations. However, obtaining a template surface requires expensive multi-view capture systems, laser scans, or strictly controlled conditions. Previous methods avoid using a template but rely on a costly or ill-posed mapping from observation to canonical space. We propose a hybrid point-based representation for reconstructing animatable characters that does not require an explicit surface model, while being generalizable to novel poses. For a given video, our method automatically produces an explicit set of 3D points representing approximate canonical geometry, and learns an articulated deformation model that produces pose-dependent point transformations. The points serve both as a scaffold for high-frequency neural features and an anchor for efficiently mapping between observation and canonical space. We demonstrate on established benchmarks that our representation overcomes limitations of prior work operating in either canonical or in observation space. Moreover, our automatic point extraction approach enables learning models of human and animal characters alike, matching the performance of the methods using rigged surface templates despite being more general. Project website: https://lemonatsu.github.io/npc/
翻译:高保真人体三维模型如今可直接从视频中学习,通常通过将基于模板的表面模型与神经表示相结合来实现。然而,获取模板表面需要昂贵的多视角采集系统、激光扫描或严格受控的条件。此前方法虽免于使用模板,但依赖于从观测空间到标准空间的代价高昂或病态映射。我们提出一种混合点云表示方法,用于重建可动画化的角色,该方法无需显式表面模型,同时可泛化至新姿态。针对给定视频,我们的方法自动生成一组显式三维点,代表近似的标准几何结构,并学习一个可驱动的形变模型,产生依赖于姿态的点云变换。这些点既作为高频神经特征的支撑结构,又作为观测空间与标准空间间高效映射的锚点。我们在已有基准数据集上证明,该表示方法克服了此前工作在标准空间或观测空间中操作的局限性。此外,我们的自动点云提取方法能够学习人类与动物角色的模型,尽管更具通用性,其性能仍可与使用带骨骼表面模板的方法相媲美。项目网站:https://lemonatsu.github.io/npc/