Controlling Human Shape and Pose in Text-to-Image Diffusion Models via Domain Adaptation

We present a methodology for conditional control of human shape and pose in pretrained text-to-image diffusion models using a 3D human parametric model (SMPL). Fine-tuning these diffusion models to adhere to new conditions requires large datasets and high-quality annotations, which can be more cost-effectively acquired through synthetic data generation rather than real-world data. However, the domain gap and low scene diversity of synthetic data can compromise the pretrained model's visual fidelity. We propose a domain-adaptation technique that maintains image quality by isolating synthetically trained conditional information in the classifier-free guidance vector and composing it with another control network to adapt the generated images to the input domain. To achieve SMPL control, we fine-tune a ControlNet-based architecture on the synthetic SURREAL dataset of rendered humans and apply our domain adaptation at generation time. Experiments demonstrate that our model achieves greater shape and pose diversity than the 2d pose-based ControlNet, while maintaining the visual fidelity and improving stability, proving its usefulness for downstream tasks such as human animation.

翻译：本文提出了一种利用三维人体参数化模型（SMPL）在预训练的文本到图像扩散模型中实现人体形状与姿态条件控制的方法。为使扩散模型适应新的控制条件而进行微调通常需要大规模数据集与高质量标注，相较于真实世界数据，通过合成数据生成能以更低成本满足此需求。然而，合成数据存在的领域差异与场景多样性不足会损害预训练模型的视觉保真度。我们提出一种领域自适应技术，通过将基于合成数据训练的条件信息隔离在无分类器引导向量中，并结合另一控制网络将生成图像适配至输入域，从而保持图像质量。为实现SMPL控制，我们在合成人体渲染数据集SURREAL上对基于ControlNet的架构进行微调，并在生成阶段应用所提领域自适应方法。实验表明，相较于基于二维姿态的ControlNet，我们的模型在保持视觉保真度并提升稳定性的同时，实现了更高的形状与姿态多样性，证明了其在人体动画等下游任务中的实用价值。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日