Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. The key ideas behind Parallel-UNet include: 1) garment is warped implicitly via a cross attention mechanism, 2) garment warp and person blend happen as part of a unified process as opposed to a sequence of two separate tasks. Experimental results indicate that TryOnDiffusion achieves state-of-the-art performance both qualitatively and quantitatively.
翻译:给定两个人的图像,其中一张包含穿着某衣物的个体,另一张为另一人,我们的目标是生成该衣物在输入人物身上的可视化效果。核心挑战在于合成一张逼真且保留细节的衣物可视化图像,同时根据人物姿态和体型的大幅变化对衣物进行变形。以往的方法要么侧重于衣物细节保留而无法有效处理姿态和体型变化,要么允许在所需体型和姿态下进行试穿但缺乏衣物细节。本文提出一种基于扩散模型的架构,通过融合两个UNet(称为Parallel-UNet),在单一网络中实现衣物细节保留和针对显著姿态及体型变化的衣物变形。Parallel-UNet的关键思想包括:1)衣物通过交叉注意力机制隐式变形;2)衣物变形与人物融合作为统一过程而非两个独立任务的序列。实验结果表明,TryOnDiffusion在定性与定量评估中均达到了最先进水平。