Fusion Embedding for Pose-Guided Person Image Synthesis with Diffusion Model

Pose-Guided Person Image Synthesis (PGPIS) aims to synthesize high-quality person images corresponding to target poses while preserving the appearance of the source image. Recently, PGPIS methods that use diffusion models have achieved competitive performance. Most approaches involve extracting representations of the target pose and source image and learning their relationships in the generative model's training process. This approach makes it difficult to learn the semantic relationships between the input and target images and complicates the model structure needed to enhance generation results. To address these issues, we propose Fusion embedding for PGPIS using a Diffusion Model (FPDM). Inspired by the successful application of pre-trained CLIP models in text-to-image diffusion models, our method consists of two stages. The first stage involves training the fusion embedding of the source image and target pose to align with the target image's embedding. In the second stage, the generative model uses this fusion embedding as a condition to generate the target image. We applied the proposed method to the benchmark datasets DeepFashion and RWTH-PHOENIX-Weather 2014T, and conducted both quantitative and qualitative evaluations, demonstrating state-of-the-art (SOTA) performance. An ablation study of the model structure showed that even a model using only the second stage achieved performance close to the other PGPIS SOTA models. The code is available at https://github.com/dhlee-work/FPDM.

翻译：姿态引导人物图像合成（PGPIS）旨在合成与目标姿态对应的高质量人物图像，同时保持源图像的外观特征。近年来，采用扩散模型的PGPIS方法已取得显著性能。现有方法通常提取目标姿态和源图像的表示，并在生成模型训练过程中学习其关联关系。这种方式难以有效学习输入图像与目标图像之间的语义关联，且为提升生成效果所需的模型结构较为复杂。为解决这些问题，我们提出基于扩散模型的PGPIS融合嵌入方法（FPDM）。受预训练CLIP模型在文生图扩散模型中成功应用的启发，本方法包含两个阶段：第一阶段训练源图像与目标姿态的融合嵌入，使其与目标图像的嵌入对齐；第二阶段生成模型以该融合嵌入为条件生成目标图像。我们在基准数据集DeepFashion和RWTH-PHOENIX-Weather 2014T上应用所提方法，通过定量与定性评估证明了其达到最先进（SOTA）性能。模型结构的消融研究表明，即使仅使用第二阶段的模型也能获得接近其他PGPIS SOTA模型的性能。代码已开源：https://github.com/dhlee-work/FPDM。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日