It is now possible to reconstruct dynamic human motion and shape from a sparse set of cameras using Neural Radiance Fields (NeRF) driven by an underlying skeleton. However, a challenge remains to model the deformation of cloth and skin in relation to skeleton pose. Unlike existing avatar models that are learned implicitly or rely on a proxy surface, our approach is motivated by the observation that different poses necessitate unique frequency assignments. Neglecting this distinction yields noisy artifacts in smooth areas or blurs fine-grained texture and shape details in sharp regions. We develop a two-branch neural network that is adaptive and explicit in the frequency domain. The first branch is a graph neural network that models correlations among body parts locally, taking skeleton pose as input. The second branch combines these correlation features to a set of global frequencies and then modulates the feature encoding. Our experiments demonstrate that our network outperforms state-of-the-art methods in terms of preserving details and generalization capabilities.
翻译:利用神经辐射场(NeRF)驱动底层骨骼,现在可以从稀疏相机视角重建动态人体运动与形状。然而,建模布料与皮肤随骨骼姿态的形变仍是一大挑战。不同于现有隐式学习或依赖代理表面的化身模型,我们的方法源于一个观察:不同姿态需要特定的频率分配。忽略这一差异会在平滑区域产生噪声伪影,或在锐利区域模糊精细纹理与形状细节。我们开发了一个双分支神经网络,在频域中兼具自适应性与显式性。第一个分支是图神经网络,它以骨骼姿态为输入,局部建模身体部位间的相关性。第二个分支将这些相关性特征整合为全局频率集合,进而调制特征编码。实验表明,我们的方法在细节保持与泛化能力方面均优于现有最先进技术。