FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On

Given a person and a garment image, virtual try-on (VTO) aims to synthesize a realistic image of the person wearing the garment, while preserving their original pose and identity. Although recent VTO methods excel at visualizing garment appearance, they largely overlook a crucial aspect of the try-on experience: the accuracy of garment fit -- for example, depicting how an extra-large shirt looks on an extra-small person. A key obstacle is the absence of datasets that provide precise garment and body size information, particularly for "ill-fit" cases, where garments are significantly too large or too small. Consequently, current VTO methods default to generating well-fitted results regardless of the garment or person size. In this paper, we take the first steps towards solving this open problem. We introduce FIT (Fit-Inclusive Try-on), a large-scale VTO dataset comprising over 1.13M try-on image triplets accompanied by precise body and garment measurements. We overcome the challenges of data collection via a scalable synthetic strategy: (1) We programmatically generate 3D garments using GarmentCode and drape them via physics simulation to capture realistic garment fit. (2) We employ a novel re-texturing framework to transform synthetic renderings into photorealistic images while strictly preserving geometry. (3) We introduce person identity preservation into our re-texturing model to generate paired person images (same person, different garments) for supervised training. Finally, we leverage our FIT dataset to train a baseline fit-aware virtual try-on model. Our data and results set the new state-of-the-art for fit-aware virtual try-on, as well as offer a robust benchmark for future research. We will make all data and code publicly available on our project page: https://johannakarras.github.io/FIT.

翻译：给定人物与服装图像，虚拟试穿（VTO）旨在合成人物穿着该服装的真实图像，同时保留其原始姿态与身份特征。尽管近期VTO方法在可视化服装外观方面表现出色，但大多忽略了试穿体验的关键维度：服装合身度的准确性——例如，如何呈现超大号衬衫穿在极小身材人物身上的效果。其主要障碍在于缺乏提供精确服装与身体尺寸信息的数据集，尤其是针对服装显著过大或过小的"不合身"案例。因此，当前VTO方法不论服装或人物尺寸如何，均默认生成合身效果。本文首次着手解决这一开放性问题。我们提出FIT（合身包容性试穿），一个大规模VTO数据集，包含超过113万组试穿图像三元组，并附有精确的人体与服装尺寸数据。通过可扩展的合成策略克服数据采集挑战：（1）利用GarmentCode程序化生成3D服装，并通过物理模拟进行悬垂处理以捕捉真实的服装合身效果；（2）采用新型重新纹理框架将合成渲染图像转化为逼真图像，同时严格保留几何结构；（3）在重新纹理模型中引入人物身份保持机制，生成用于监督训练的配对人物图像（同一人物、不同服装）。最后，我们利用FIT数据集训练基线合身感知虚拟试穿模型。我们的数据与结果树立了合身感知虚拟试穿的新标杆，并为未来研究提供了稳健的基准。所有数据与代码将在项目页面公开：https://johannakarras.github.io/FIT。