While diffusion-based virtual try-on has achieved impressive visual realism, most methods treat the task as 2D inpainting, prioritizing texture preservation over physical plausibility. Consequently, they often produce plausible-looking images that fail to reflect authentic garment fit across diverse body shapes. We present FitVTON, a Fit-aware virtual try-on model on different bodies in the wild. FitVTON encodes garment-body size through structured text prompts, and learn from simulated try-on triplets from parameterized garment model. To improve the fitting effects over garment silhouettes, we introduce two auxiliary head to predict the masks for both the garment and the exposed body. We further introduce a texture rectification stage to improve realistic appearance from simulated data. To evaluate the fitting fidelity, we curate a real-world dataset, FittingEffect3K, combining VLM-based scoring protocol. Both subjective and quantitive experiments show that FitVTON demonstrate authentic fitting fidelity, with significant sizing accuracy and shape preservation over state-of-the-art methods while maintaining competitive image quality. Project Page: https://zenoning.github.io/FitVTON/.
翻译:尽管基于扩散模型的虚拟试穿已取得了令人瞩目的视觉逼真度,但大多数方法将任务视为二维修复,优先考虑纹理保留而非物理合理性。因此,它们常生成看似合理却无法反映不同体型下真实服装贴合度的图像。我们提出了FitVTON,一种面向野外不同体型的合身感知虚拟试穿模型。FitVTON通过结构化文本提示编码服装-身体尺码,并从参数化服装模型的模拟试穿三元组中学习。为改善服装轮廓的贴合效果,我们引入两个辅助头来预测服装和裸露身体的面罩。我们进一步引入纹理修正阶段,以提升模拟数据的真实外观。为评估贴合保真度,我们整理了一个真实世界数据集FittingEffect3K,并结合了基于VLM的评分协议。主观与定量实验均表明,FitVTON展示了真实的贴合保真度,在尺码准确性和形状保留方面显著优于现有最优方法,同时保持了有竞争力的图像质量。项目页面:https://zenoning.github.io/FitVTON/。