While image-based virtual try-on has made significant strides, emerging approaches still fall short of delivering high-fidelity and robust fitting images across various scenarios, as their models suffer from issues of ill-fitted garment styles and quality degrading during the training process, not to mention the lack of support for various combinations of attire. Therefore, we first propose a lightweight, scalable, operator known as Hydra Block for attire combinations. This is achieved through a parallel attention mechanism that facilitates the feature injection of multiple garments from conditionally encoded branches into the main network. Secondly, to significantly enhance the model's robustness and expressiveness in real-world scenarios, we evolve its potential across diverse settings by synthesizing the residuals of multiple models, as well as implementing a mask region boost strategy to overcome the instability caused by information leakage in existing models. Equipped with the above design, AnyFit surpasses all baselines on high-resolution benchmarks and real-world data by a large gap, excelling in producing well-fitting garments replete with photorealistic and rich details. Furthermore, AnyFit's impressive performance on high-fidelity virtual try-ons in any scenario from any image, paves a new path for future research within the fashion community.
翻译:尽管基于图像的虚拟试穿技术已取得显著进展,但现有方法仍难以在各种场景下生成高保真且鲁棒的试穿图像。这是因为现有模型存在服装风格适配不佳、训练过程中质量退化等问题,更不用说对多样化服饰组合的支持不足。因此,我们首先提出了一种轻量级、可扩展的服饰组合算子——Hydra Block。该算子通过并行注意力机制,将来自条件编码分支的多件服装特征注入到主网络中。其次,为显著增强模型在真实场景中的鲁棒性与表现力,我们通过合成多个模型的残差来扩展其在不同场景下的潜力,并采用掩码区域增强策略以克服现有模型中因信息泄露导致的不稳定性。基于上述设计,AnyFit 在高分辨率基准测试和真实数据上均以显著优势超越所有基线方法,能够生成贴合身形、充满逼真感与丰富细节的服装图像。此外,AnyFit 在任意场景、任意图像的高保真虚拟试穿中展现出的卓越性能,为时尚领域的未来研究开辟了新路径。