In this work, we propose MagicPose, a diffusion-based model for 2D human pose and facial expression retargeting. Specifically, given a reference image, we aim to generate a person's new images by controlling the poses and facial expressions while keeping the identity unchanged. To this end, we propose a two-stage training strategy to disentangle human motions and appearance (e.g., facial expressions, skin tone and dressing), consisting of (1) the pre-training of an appearance-control block and (2) learning appearance-disentangled pose control. Our novel design enables robust appearance control over generated human images, including body, facial attributes, and even background. By leveraging the prior knowledge of image diffusion models, MagicPose generalizes well to unseen human identities and complex poses without the need for additional fine-tuning. Moreover, the proposed model is easy to use and can be considered as a plug-in module/extension to Stable Diffusion.
翻译:本文提出MagicPose,一种基于扩散模型的二维人体姿态与面部表情重定向方法。具体而言,给定参考图像,我们旨在通过控制姿态和面部表情生成人物的新图像,同时保持身份特征不变。为此,我们提出两阶段训练策略以解耦人体运动与外观(如面部表情、肤色与着装):(1)外观控制模块预训练,以及(2)外观解耦姿态控制学习。该创新设计能够对生成的人体图像实现鲁棒的外观控制,涵盖身体、面部属性乃至背景。通过利用图像扩散模型的先验知识,MagicPose无需额外微调即可泛化至未见过的身份与复杂姿态。此外,本模型易于使用,可视为Stable Diffusion的即插即用模块/扩展。