Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

Creating realistic avatars from a single RGB image is an attractive yet challenging problem. Due to its ill-posed nature, recent works leverage powerful prior from 2D diffusion models pretrained on large datasets. Although 2D diffusion models demonstrate strong generalization capability, they cannot provide multi-view shape priors with guaranteed 3D consistency. We propose Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion. Our key insight is that 2D multi-view diffusion and 3D reconstruction models provide complementary information for each other, and by coupling them in a tight manner, we can fully leverage the potential of both models. We introduce a novel image-conditioned generative 3D Gaussian Splats reconstruction model that leverages the priors from 2D multi-view diffusion models, and provides an explicit 3D representation, which further guides the 2D reverse sampling process to have better 3D consistency. Experiments show that our proposed framework outperforms state-of-the-art methods and enables the creation of realistic avatars from a single RGB image, achieving high-fidelity in both geometry and appearance. Extensive ablations also validate the efficacy of our design, (1) multi-view 2D priors conditioning in generative 3D reconstruction and (2) consistency refinement of sampling trajectory via the explicit 3D representation. Our code and models will be released on https://yuxuan-xue.com/human-3diffusion.

翻译：从单张RGB图像创建逼真的数字人是一个极具吸引力但充满挑战的问题。由于其不适定性，近期研究利用在大型数据集上预训练的二维扩散模型所提供的强大先验。尽管二维扩散模型展现出强大的泛化能力，但它们无法提供具有三维一致性保证的多视角形状先验。我们提出Human 3Diffusion：通过显式三维一致扩散实现逼真数字人创建。我们的核心见解是，二维多视角扩散模型与三维重建模型能够为彼此提供互补信息，通过将它们紧密耦合，我们可以充分发挥两种模型的潜力。我们引入了一种新颖的图像条件生成式三维高斯泼溅重建模型，该模型利用来自二维多视角扩散模型的先验，并提供显式的三维表示，这进一步引导二维反向采样过程以获得更好的三维一致性。实验表明，我们提出的框架优于现有最先进方法，能够从单张RGB图像创建逼真的数字人，在几何形状与外观上都实现了高保真度。广泛的消融实验也验证了我们设计的有效性：（1）生成式三维重建中的多视角二维先验条件化，以及（2）通过显式三维表示对采样轨迹进行的一致性优化。我们的代码与模型将在 https://yuxuan-xue.com/human-3diffusion 上发布。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日