Learning Disentangled Avatars with Hybrid 3D Representations

Tremendous efforts have been made to learn animatable and photorealistic human avatars. Towards this end, both explicit and implicit 3D representations are heavily studied for a holistic modeling and capture of the whole human (e.g., body, clothing, face and hair), but neither representation is an optimal choice in terms of representation efficacy since different parts of the human avatar have different modeling desiderata. For example, meshes are generally not suitable for modeling clothing and hair. Motivated by this, we present Disentangled Avatars~(DELTA), which models humans with hybrid explicit-implicit 3D representations. DELTA takes a monocular RGB video as input, and produces a human avatar with separate body and clothing/hair layers. Specifically, we demonstrate two important applications for DELTA. For the first one, we consider the disentanglement of the human body and clothing and in the second, we disentangle the face and hair. To do so, DELTA represents the body or face with an explicit mesh-based parametric 3D model and the clothing or hair with an implicit neural radiance field. To make this possible, we design an end-to-end differentiable renderer that integrates meshes into volumetric rendering, enabling DELTA to learn directly from monocular videos without any 3D supervision. Finally, we show that how these two applications can be easily combined to model full-body avatars, such that the hair, face, body and clothing can be fully disentangled yet jointly rendered. Such a disentanglement enables hair and clothing transfer to arbitrary body shapes. We empirically validate the effectiveness of DELTA's disentanglement by demonstrating its promising performance on disentangled reconstruction, virtual clothing try-on and hairstyle transfer. To facilitate future research, we also release an open-sourced pipeline for the study of hybrid human avatar modeling.

翻译：为了实现可动画化和逼真的人体虚拟形象，研究者们付出了巨大努力。为此，显式和隐式3D表示都被深入探索用于整体建模和捕捉完整人体（如身体、衣物、面部和头发），但两种表示在表示效能方面均非最优选择，因为人体虚拟形象的不同部分具有不同的建模需求。例如，网格通常不适合建模衣物和头发。受此启发，我们提出可分离虚拟形象（DELTA），采用混合显式-隐式3D表示来建模人体。DELTA以单目RGB视频为输入，生成具有独立身体层和衣物/头发层的人体虚拟形象。具体而言，我们展示了DELTA的两个重要应用：第一个应用考虑人体与衣物的分离，第二个应用则实现面部与头发的分离。为此，DELTA使用基于显式网格的参数化3D模型表示身体或面部，而用隐式神经辐射场表示衣物或头发。为实现这一目标，我们设计了一个端到端可微分渲染器，将网格集成到体渲染中，使DELTA无需任何3D监督即可直接从单目视频中学习。最后，我们展示了这两个应用如何轻松结合以建模全身虚拟形象，使得头发、面部、身体和衣物能够完全分离并联合渲染。这种分离能力支持将头发和衣物迁移到任意体形。通过展示其在分离重建、虚拟试衣和发型迁移方面的优异性能，我们实证验证了DELTA分离机制的有效性。为促进未来研究，我们还开源了用于混合人体虚拟形象建模的研究流程。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日