We present AvatarReX, a new method for learning NeRF-based full-body avatars from video data. The learnt avatar not only provides expressive control of the body, hands and the face together, but also supports real-time animation and rendering. To this end, we propose a compositional avatar representation, where the body, hands and the face are separately modeled in a way that the structural prior from parametric mesh templates is properly utilized without compromising representation flexibility. Furthermore, we disentangle the geometry and appearance for each part. With these technical designs, we propose a dedicated deferred rendering pipeline, which can be executed in real-time framerate to synthesize high-quality free-view images. The disentanglement of geometry and appearance also allows us to design a two-pass training strategy that combines volume rendering and surface rendering for network training. In this way, patch-level supervision can be applied to force the network to learn sharp appearance details on the basis of geometry estimation. Overall, our method enables automatic construction of expressive full-body avatars with real-time rendering capability, and can generate photo-realistic images with dynamic details for novel body motions and facial expressions.
翻译:摘要:我们提出AvatarReX,一种从视频数据中学习基于NeRF的全身虚拟化身的新方法。所学化身不仅能同时实现身体、手部和面部的表现力控制,还支持实时动画与渲染。为此,我们提出一种组合式化身表征,将身体、手部和面部单独建模,在适当利用参数化网格模板的结构先验的同时,不牺牲表征灵活性。此外,我们为每个部分解耦几何与外观。基于这些技术设计,我们提出一种专用延迟渲染管线,能以实时帧率执行,合成高质量的自由视点图像。几何与外观的解耦还使我们能够设计一种结合体渲染和表面渲染的双通道训练策略用于网络训练。通过这种方式,可应用块级监督,迫使网络在几何估计基础上学习锐利的外观细节。总体而言,我们的方法能自动构建具有实时渲染能力的表现力全身虚拟化身,并为新颖的身体动作和面部表情生成带有动态细节的照片级真实图像。