Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling

This paper introduces Motion-oriented Compositional Neural Radiance Fields (MoCo-NeRF), a framework designed to perform free-viewpoint rendering of monocular human videos via novel non-rigid motion modeling approach. In the context of dynamic clothed humans, complex cloth dynamics generate non-rigid motions that are intrinsically distinct from skeletal articulations and critically important for the rendering quality. The conventional approach models non-rigid motions as spatial (3D) deviations in addition to skeletal transformations. However, it is either time-consuming or challenging to achieve optimal quality due to its high learning complexity without a direct supervision. To target this problem, we propose a novel approach of modeling non-rigid motions as radiance residual fields to benefit from more direct color supervision in the rendering and utilize the rigid radiance fields as a prior to reduce the complexity of the learning process. Our approach utilizes a single multiresolution hash encoding (MHE) to concurrently learn the canonical T-pose representation from rigid skeletal motions and the radiance residual field for non-rigid motions. Additionally, to further improve both training efficiency and usability, we extend MoCo-NeRF to support simultaneous training of multiple subjects within a single framework, thanks to our effective design for modeling non-rigid motions. This scalability is achieved through the integration of a global MHE and learnable identity codes in addition to multiple local MHEs. We present extensive results on ZJU-MoCap and MonoCap, clearly demonstrating state-of-the-art performance in both single- and multi-subject settings. The code and model will be made publicly available at the project page: https://stevejaehyeok.github.io/publications/moco-nerf.

翻译：本文提出面向运动的组合式神经辐射场（MoCo-NeRF），该框架通过新颖的非刚性运动建模方法，实现对单目人体视频的自由视点渲染。在动态着衣人体建模中，复杂的布料动态会产生非刚性运动，这种运动本质上不同于骨骼关节运动，且对渲染质量至关重要。传统方法将非刚性运动建模为骨骼变换之外的空间（三维）偏移。然而，由于缺乏直接监督且学习复杂度高，该方法要么耗时，要么难以达到最优质量。针对这一问题，我们提出一种新颖方法：将非刚性运动建模为辐射残差场，从而受益于渲染过程中更直接的颜色监督，并利用刚性辐射场作为先验以降低学习过程的复杂度。我们的方法采用单一多分辨率哈希编码，同时从刚性骨骼运动中学习标准T姿态表示，并为非刚性运动学习辐射残差场。此外，为进一步提升训练效率和可用性，得益于我们对非刚性运动建模的有效设计，我们将MoCo-NeRF扩展至支持在单一框架内同时训练多个对象。这种可扩展性通过集成全局多分辨率哈希编码和可学习身份编码，以及多个局部多分辨率哈希编码来实现。我们在ZJU-MoCap和MonoCap数据集上进行了大量实验，结果清晰表明该方法在单对象与多对象设置下均达到了最先进的性能。代码与模型将在项目页面公开：https://stevejaehyeok.github.io/publications/moco-nerf。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日