In this paper, we propose a novel virtual try-on from unconstrained designs (ucVTON) task to enable photorealistic synthesis of personalized composite clothing on input human images. Unlike prior arts constrained by specific input types, our method allows flexible specification of style (text or image) and texture (full garment, cropped sections, or texture patches) conditions. To address the entanglement challenge when using full garment images as conditions, we develop a two-stage pipeline with explicit disentanglement of style and texture. In the first stage, we generate a human parsing map reflecting the desired style conditioned on the input. In the second stage, we composite textures onto the parsing map areas based on the texture input. To represent complex and non-stationary textures that have never been achieved in previous fashion editing works, we first propose extracting hierarchical and balanced CLIP features and applying position encoding in VTON. Experiments demonstrate superior synthesis quality and personalization enabled by our method. The flexible control over style and texture mixing brings virtual try-on to a new level of user experience for online shopping and fashion design.
翻译:本文提出一种新的无约束设计虚拟试穿(ucVTON)任务,能够在输入人物图像上实现个性化组合服装的逼真合成。与受限于特定输入类型的先前方法不同,我们的方法允许灵活指定风格(文本或图像)和纹理(完整服装、裁剪区域或纹理片段)条件。为解决使用完整服装图像作为条件时的纠缠问题,我们开发了一个两阶段流水线,显式解耦风格与纹理。第一阶段,基于输入条件生成反映期望风格的人体解析图;第二阶段,根据纹理输入将纹理合成到解析图区域。为表征此前时尚编辑工作中从未实现的复杂非平稳纹理,我们首次提出提取分层均衡的CLIP特征并在VTON中应用位置编码。实验表明,我们的方法实现了优越的合成质量与个性化。对风格与纹理混合的灵活控制,将虚拟试穿技术提升至在线购物与时尚设计用户体验的新高度。