High-fidelity human 3D models can now be learned directly from videos, typically by combining a template-based surface model with neural representations. However, obtaining a template surface requires expensive multi-view capture systems, laser scans, or strictly controlled conditions. Previous methods avoid using a template but rely on a costly or ill-posed mapping from observation to canonical space. We propose a hybrid point-based representation for reconstructing animatable characters that does not require an explicit surface model, while being generalizable to novel poses. For a given video, our method automatically produces an explicit set of 3D points representing approximate canonical geometry, and learns an articulated deformation model that produces pose-dependent point transformations. The points serve both as a scaffold for high-frequency neural features and an anchor for efficiently mapping between observation and canonical space. We demonstrate on established benchmarks that our representation overcomes limitations of prior work operating in either canonical or in observation space. Moreover, our automatic point extraction approach enables learning models of human and animal characters alike, matching the performance of the methods using rigged surface templates despite being more general. Project website: https://lemonatsu.github.io/npc/
翻译:高保真人体三维模型如今可直接从视频中学习获得,通常采用模板化表面模型与神经表示相结合的方式。然而,获取模板表面需要昂贵的多视角捕捉系统、激光扫描或严格受控的条件。现有方法虽无需模板,但依赖从观测空间到规范空间的映射,这种映射成本高昂且存在病态问题。我们提出一种混合点云表示方法,可在不依赖显式表面模型的前提下重建可动画角色,同时具备对新颖姿态的泛化能力。对于给定视频,我们的方法自动生成代表近似规范几何的显式三维点集,并学习可产生姿态相关点变换的关节变形模型。这些点既可作为高频神经特征的支撑结构,也可作为观测空间与规范空间高效映射的锚点。我们在既有基准测试中证明,该表示克服了以往工作在规范空间或观测空间中的局限性。此外,我们的自动点提取方法能够同等学习人体与动物角色模型,在保持更泛化能力的同时,其性能可媲美使用带骨骼蒙皮模板的方法。项目网站:https://lemonatsu.github.io/npc/