Image-based fashion design with AI techniques has attracted increasing attention in recent years. We focus on a new fashion design task, where we aim to transfer a reference appearance image onto a clothing image while preserving the structure of the clothing image. It is a challenging task since there are no reference images available for the newly designed output fashion images. Although diffusion-based image translation or neural style transfer (NST) has enabled flexible style transfer, it is often difficult to maintain the original structure of the image realistically during the reverse diffusion, especially when the referenced appearance image greatly differs from the common clothing appearance. To tackle this issue, we present a novel diffusion model-based unsupervised structure-aware transfer method to semantically generate new clothes from a given clothing image and a reference appearance image. In specific, we decouple the foreground clothing with automatically generated semantic masks by conditioned labels. And the mask is further used as guidance in the denoising process to preserve the structure information. Moreover, we use the pre-trained vision Transformer (ViT) for both appearance and structure guidance. Our experimental results show that the proposed method outperforms state-of-the-art baseline models, generating more realistic images in the fashion design task. Code and demo can be found at https://github.com/Rem105-210/DiffFashion.
翻译:基于人工智能的图像时尚设计技术近年来备受关注。本文聚焦一项新型时尚设计任务,目标是在保留服装图像结构的前提下,将参考外观图像迁移至待设计服装图像。由于新设计的输出时尚图像缺乏参考图像,该任务颇具挑战性。尽管基于扩散模型的图像翻译或神经风格迁移(NST)已实现灵活的样式迁移,但在反向扩散过程中,尤其是当参考外观图像与常见服装外观差异显著时,难以真实保持图像的原始结构。为解决此问题,我们提出一种基于扩散模型的无监督结构感知迁移方法,通过给定的服装图像与参考外观图像语义化生成新服装。具体而言,我们利用条件标注自动生成的语义掩码将前景服装解耦,并将掩码作为去噪过程中的引导信息以保留结构特征。此外,我们采用预训练视觉Transformer(ViT)同时作为外观与结构引导。实验结果表明,所提方法在时尚设计任务中优于现有最优基线模型,可生成更真实的图像。相关代码与演示请访问 https://github.com/Rem105-210/DiffFashion。