Tremendous efforts have been made to learn animatable and photorealistic human avatars. Towards this end, both explicit and implicit 3D representations are heavily studied for a holistic modeling and capture of the whole human (e.g., body, clothing, face and hair), but neither representation is an optimal choice in terms of representation efficacy since different parts of the human avatar have different modeling desiderata. For example, meshes are generally not suitable for modeling clothing and hair. Motivated by this, we present Disentangled Avatars~(DELTA), which models humans with hybrid explicit-implicit 3D representations. DELTA takes a monocular RGB video as input, and produces a human avatar with separate body and clothing/hair layers. Specifically, we demonstrate two important applications for DELTA. For the first one, we consider the disentanglement of the human body and clothing and in the second, we disentangle the face and hair. To do so, DELTA represents the body or face with an explicit mesh-based parametric 3D model and the clothing or hair with an implicit neural radiance field. To make this possible, we design an end-to-end differentiable renderer that integrates meshes into volumetric rendering, enabling DELTA to learn directly from monocular videos without any 3D supervision. Finally, we show that how these two applications can be easily combined to model full-body avatars, such that the hair, face, body and clothing can be fully disentangled yet jointly rendered. Such a disentanglement enables hair and clothing transfer to arbitrary body shapes. We empirically validate the effectiveness of DELTA's disentanglement by demonstrating its promising performance on disentangled reconstruction, virtual clothing try-on and hairstyle transfer. To facilitate future research, we also release an open-sourced pipeline for the study of hybrid human avatar modeling.
翻译:为了实现可动画化和逼真的人体虚拟形象,研究者们付出了巨大努力。为此,显式和隐式3D表示都被深入探索用于整体建模和捕捉完整人体(如身体、衣物、面部和头发),但两种表示在表示效能方面均非最优选择,因为人体虚拟形象的不同部分具有不同的建模需求。例如,网格通常不适合建模衣物和头发。受此启发,我们提出可分离虚拟形象(DELTA),采用混合显式-隐式3D表示来建模人体。DELTA以单目RGB视频为输入,生成具有独立身体层和衣物/头发层的人体虚拟形象。具体而言,我们展示了DELTA的两个重要应用:第一个应用考虑人体与衣物的分离,第二个应用则实现面部与头发的分离。为此,DELTA使用基于显式网格的参数化3D模型表示身体或面部,而用隐式神经辐射场表示衣物或头发。为实现这一目标,我们设计了一个端到端可微分渲染器,将网格集成到体渲染中,使DELTA无需任何3D监督即可直接从单目视频中学习。最后,我们展示了这两个应用如何轻松结合以建模全身虚拟形象,使得头发、面部、身体和衣物能够完全分离并联合渲染。这种分离能力支持将头发和衣物迁移到任意体形。通过展示其在分离重建、虚拟试衣和发型迁移方面的优异性能,我们实证验证了DELTA分离机制的有效性。为促进未来研究,我们还开源了用于混合人体虚拟形象建模的研究流程。