This paper introduces Motion-oriented Compositional Neural Radiance Fields (MoCo-NeRF), a framework designed to perform free-viewpoint rendering of monocular human videos via novel non-rigid motion modeling approach. In the context of dynamic clothed humans, complex cloth dynamics generate non-rigid motions that are intrinsically distinct from skeletal articulations and critically important for the rendering quality. The conventional approach models non-rigid motions as spatial (3D) deviations in addition to skeletal transformations. However, it is either time-consuming or challenging to achieve optimal quality due to its high learning complexity without a direct supervision. To target this problem, we propose a novel approach of modeling non-rigid motions as radiance residual fields to benefit from more direct color supervision in the rendering and utilize the rigid radiance fields as a prior to reduce the complexity of the learning process. Our approach utilizes a single multiresolution hash encoding (MHE) to concurrently learn the canonical T-pose representation from rigid skeletal motions and the radiance residual field for non-rigid motions. Additionally, to further improve both training efficiency and usability, we extend MoCo-NeRF to support simultaneous training of multiple subjects within a single framework, thanks to our effective design for modeling non-rigid motions. This scalability is achieved through the integration of a global MHE and learnable identity codes in addition to multiple local MHEs. We present extensive results on ZJU-MoCap and MonoCap, clearly demonstrating state-of-the-art performance in both single- and multi-subject settings. The code and model will be made publicly available at the project page: https://stevejaehyeok.github.io/publications/moco-nerf.
翻译:本文提出面向运动的组合式神经辐射场(MoCo-NeRF),该框架通过新颖的非刚性运动建模方法,实现对单目人体视频的自由视点渲染。在动态着衣人体建模中,复杂的布料动态会产生非刚性运动,这种运动本质上不同于骨骼关节运动,且对渲染质量至关重要。传统方法将非刚性运动建模为骨骼变换之外的空间(三维)偏移。然而,由于缺乏直接监督且学习复杂度高,该方法要么耗时,要么难以达到最优质量。针对这一问题,我们提出一种新颖方法:将非刚性运动建模为辐射残差场,从而受益于渲染过程中更直接的颜色监督,并利用刚性辐射场作为先验以降低学习过程的复杂度。我们的方法采用单一多分辨率哈希编码,同时从刚性骨骼运动中学习标准T姿态表示,并为非刚性运动学习辐射残差场。此外,为进一步提升训练效率和可用性,得益于我们对非刚性运动建模的有效设计,我们将MoCo-NeRF扩展至支持在单一框架内同时训练多个对象。这种可扩展性通过集成全局多分辨率哈希编码和可学习身份编码,以及多个局部多分辨率哈希编码来实现。我们在ZJU-MoCap和MonoCap数据集上进行了大量实验,结果清晰表明该方法在单对象与多对象设置下均达到了最先进的性能。代码与模型将在项目页面公开:https://stevejaehyeok.github.io/publications/moco-nerf。