MOFA-VTON: More Fashion Possibilities with Fine-Grained Adaptations in Virtual Try-On

Virtual try-on aims to fit an in-shop clothing image onto a specific human body. An optimal virtual try-on method should provide diverse and flexible dressing options, accurately reflecting the varied wearing styles encountered in real-life scenarios, tailored to individual preferences and fashion aspirations. However, current methods predominantly perform a direct replacement of the original clothing with the target clothing, following the same dressing pattern. This limited control over clothing adaptation may result in fixed and monotonous try-on outputs. To delve into More Fashion Possibilities with Fine-Grained Adaptations in Virtual Try-On, we propose a novel virtual try-on method, termed MOFA-VTON, which allows adjustment for clothing adaptations in try-on results through simple sketches by users. Specifically, we first design a mask construction strategy that transforms user-drawn curve sketches into a dual-region mask, replacing the traditional clothing-agnostic mask and providing fine-grained layout guidance for the subsequent generation process. Further, we propose layout adjustment blocks that utilize the cross-attention mechanism to independently learn layout correspondences for upper and lower regions of the human body, refining the spatial arrangement of the two regions. With these implementations, our method enables flexible and fine-grained adaptations of target clothing, overcoming the constraints of a fixed layout. Extensive experiments on VITON-HD and DressCode datasets demonstrate that our proposed MOFA-VTON outperforms previous state-of-the-art methods and provides more fashion possibilities for virtual try-on.

翻译：虚拟试衣旨在将店内服装图像贴合到特定人体上。最优的虚拟试衣方法应提供多样且灵活的着装选项，准确反映真实场景中遇到的多种穿着风格，并满足个人偏好与时装需求。然而，当前方法主要遵循统一的着装模式，直接将原始服装替换为目标服装。这种对服装适应的有限控制可能导致试衣结果固定且单调。为探索虚拟试衣中基于细粒度适应的更多时尚可能性，我们提出一种新型虚拟试衣方法MOFA-VTON，该方法允许用户通过简单草图调整试衣结果中的服装适应方式。具体而言，我们首先设计一种掩码构建策略，将用户绘制的曲线草图转换为双区域掩码，替代传统的服装无关掩码，为后续生成过程提供细粒度的布局引导。进一步，我们提出布局调整模块，利用交叉注意力机制独立学习人体上半身与下半身区域的布局对应关系，优化两个区域的空间排布。通过上述实现，我们的方法能够实现对目标服装的灵活细粒度适应，突破固定布局的限制。在VITON-HD与DressCode数据集上的大量实验表明，我们提出的MOFA-VTON方法优于现有最优方法，为虚拟试衣提供了更多时尚可能性。