FashionComposer: Compositional Fashion Image Generation

We present FashionComposer for compositional fashion image generation. Unlike previous methods, FashionComposer is highly flexible. It takes multi-modal input (i.e., text prompt, parametric human model, garment image, and face image) and supports personalizing the appearance, pose, and figure of the human and assigning multiple garments in one pass. To achieve this, we first develop a universal framework capable of handling diverse input modalities. We construct scaled training data to enhance the model's robust compositional capabilities. To accommodate multiple reference images (garments and faces) seamlessly, we organize these references in a single image as an "asset library" and employ a reference UNet to extract appearance features. To inject the appearance features into the correct pixels in the generated result, we propose subject-binding attention. It binds the appearance features from different "assets" with the corresponding text features. In this way, the model could understand each asset according to their semantics, supporting arbitrary numbers and types of reference images. As a comprehensive solution, FashionComposer also supports many other applications like human album generation, diverse virtual try-on tasks, etc.

翻译：我们提出了用于组合式时尚图像生成的FashionComposer。与先前方法不同，FashionComposer具有高度灵活性。它接受多模态输入（即文本提示、参数化人体模型、服装图像和面部图像），并支持一次性个性化定制人物的外观、姿态与体型，以及分配多件服装。为实现这一目标，我们首先开发了一个能够处理多样化输入模态的通用框架。我们构建了经过缩放的训练数据以增强模型的鲁棒组合能力。为了无缝容纳多个参考图像（服装与面部），我们将这些参考组织在一张图像中作为“资产库”，并采用一个参考UNet来提取外观特征。为了将外观特征注入生成结果中的正确像素，我们提出了主体绑定注意力机制。该机制将来自不同“资产”的外观特征与相应的文本特征进行绑定。通过这种方式，模型能够根据语义理解每个资产，从而支持任意数量和类型的参考图像。作为一个综合性解决方案，FashionComposer还支持许多其他应用，如人物相册生成、多样化虚拟试穿任务等。

相关内容

ASSETS

关注 0

ACM SIGACCESS Conference on Computers and Accessibility是为残疾人和老年人提供与计算机相关的设计、评估、使用和教育研究的首要论坛。我们欢迎提交原始的高质量的有关计算和可访问性的主题。今年，ASSETS首次将其范围扩大到包括关于计算机无障碍教育相关主题的原创高质量研究。官网链接：http://assets19.sigaccess.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日