Rendering moving human bodies at free viewpoints only from a monocular video is quite a challenging problem. The information is too sparse to model complicated human body structures and motions from both view and pose dimensions. Neural radiance fields (NeRF) have shown great power in novel view synthesis and have been applied to human body rendering. However, most current NeRF-based methods bear huge costs for both training and rendering, which impedes the wide applications in real-life scenarios. In this paper, we propose a rendering framework that can learn moving human body structures extremely quickly from a monocular video. The framework is built by integrating both neural fields and neural voxels. Especially, a set of generalizable neural voxels are constructed. With pretrained on various human bodies, these general voxels represent a basic skeleton and can provide strong geometric priors. For the fine-tuning process, individual voxels are constructed for learning differential textures, complementary to general voxels. Thus learning a novel body can be further accelerated, taking only a few minutes. Our method shows significantly higher training efficiency compared with previous methods, while maintaining similar rendering quality. The project page is at https://taoranyi.com/gneuvox .
翻译:从单目视频中自由视角渲染移动人体是一项极具挑战性的问题。由于信息过于稀疏,难以从视角和姿态两个维度建模复杂的人体结构与运动。神经辐射场(NeRF)在新视角合成方面展现出强大能力,并已被应用于人体渲染。然而,当前基于NeRF的方法大多在训练和渲染方面承担巨大成本,这限制了其在现实场景中的广泛应用。本文提出一种渲染框架,能够从单目视频中极其快速地学习移动人体结构。该框架通过整合神经场与神经体素构建而成,特别地,我们构建了一组通用神经体素。这些通用体素在多种人体上预训练后,代表基本骨架并提供强几何先验。在微调过程中,构建个体体素以学习差分纹理,与通用体素互补。因此,学习新人体可进一步加速,仅需数分钟。与先前方法相比,我们的方法在保持相似渲染质量的同时,展现出显著更高的训练效率。项目页面位于https://taoranyi.com/gneuvox。