Virtual Try-On (VTON) is a highly active line of research, with increasing demand. It aims to replace a piece of garment in an image with one from another, while preserving person and garment characteristics as well as image fidelity. Current literature takes a supervised approach for the task, impairing generalization and imposing heavy computation. In this paper, we present a novel zero-shot training-free method for inpainting a clothing garment by reference. Our approach employs the prior of a diffusion model with no additional training, fully leveraging its native generalization capabilities. The method employs extended attention to transfer image information from reference to target images, overcoming two significant challenges. We first initially warp the reference garment over the target human using deep features, alleviating "texture sticking". We then leverage the extended attention mechanism with careful masking, eliminating leakage of reference background and unwanted influence. Through a user study, qualitative, and quantitative comparison to state-of-the-art approaches, we demonstrate superior image quality and garment preservation compared unseen clothing pieces or human figures.
翻译:虚拟试穿(VTON)是一个需求日益增长且高度活跃的研究方向。其目标是将图像中的一件服装替换为另一图像中的服装,同时保持人物与服装特征以及图像的真实性。现有文献多采用监督学习方法处理该任务,这限制了泛化能力并带来沉重的计算负担。本文提出一种新颖的零样本免训练方法,通过参考图像实现服装区域的修复。我们的方法利用扩散模型的先验知识而无需额外训练,充分发挥其固有的泛化能力。该方法采用扩展注意力机制将图像信息从参考图像传递至目标图像,克服了两个关键挑战:首先通过深度特征将参考服装初步变形至目标人体,缓解"纹理粘连"问题;随后结合精细掩码的扩展注意力机制,消除参考背景的泄漏及非预期影响。通过用户研究、定性分析与定量比较,我们证明该方法在未见过的服装款式或人物图像上均能实现更优的图像质量与服装保持度。