Most virtual try-on research is motivated to serve the fashion business by generating images to demonstrate garments on studio models at a lower cost. However, virtual try-on should be a broader application that also allows customers to visualize garments on themselves using their own casual photos, known as in-the-wild try-on. Unfortunately, the existing methods, which achieve plausible results for studio try-on settings, perform poorly in the in-the-wild context. This is because these methods often require paired images (garment images paired with images of people wearing the same garment) for training. While such paired data is easy to collect from shopping websites for studio settings, it is difficult to obtain for in-the-wild scenes. In this work, we fill the gap by (1) introducing a StreetTryOn benchmark to support in-the-wild virtual try-on applications and (2) proposing a novel method to learn virtual try-on from a set of in-the-wild person images directly without requiring paired data. We tackle the unique challenges, including warping garments to more diverse human poses and rendering more complex backgrounds faithfully, by a novel DensePose warping correction method combined with diffusion-based conditional inpainting. Our experiments show competitive performance for standard studio try-on tasks and SOTA performance for street try-on and cross-domain try-on tasks.
翻译:大多数虚拟试穿研究旨在通过生成图像以更低成本展示服装在摄影棚模特身上的效果,从而服务于时尚产业。然而,虚拟试穿应当成为更广泛的应用,允许顾客使用自己的日常照片(即野外试穿场景)可视化服装的上身效果。遗憾的是,现有方法虽然在摄影棚试穿设置中能取得合理效果,但在野外场景中表现欠佳。这是因为这些方法通常需要配对图像(服装图像与穿着同款服装的人物图像)进行训练。虽然此类配对数据易于从购物网站收集用于摄影棚场景,却难以在野外场景中获取。本研究通过以下方式填补这一空白:(1)引入StreetTryOn基准数据集以支持野外虚拟试穿应用;(2)提出一种无需配对数据、直接从未配对野外人物图像集中学习虚拟试穿的新方法。我们通过结合基于扩散的条件修复与新颖的DensePose形变校正方法,解决了将服装适配到更多样化人体姿态及逼真渲染复杂背景等独特挑战。实验表明,我们的方法在标准摄影棚试穿任务中具有竞争力,在街景试穿与跨域试穿任务中达到最先进性能。