In this paper, we first investigate a visual quality degradation problem observed in recent high-resolution virtual try-on approach. The tendency is empirically found that the textures of clothes are squeezed at the sleeve, as visualized in the upper row of Fig.1(a). A main reason for the issue arises from a gradient conflict between two popular losses, the Total Variation (TV) and adversarial losses. Specifically, the TV loss aims to disconnect boundaries between the sleeve and torso in a warped clothing mask, whereas the adversarial loss aims to combine between them. Such contrary objectives feedback the misaligned gradients to a cascaded appearance flow estimation, resulting in undesirable squeezing artifacts. To reduce this, we propose a Sequential Deformation (SD-VITON) that disentangles the appearance flow prediction layers into TV objective-dominant (TVOB) layers and a task-coexistence (TACO) layer. Specifically, we coarsely fit the clothes onto a human body via the TVOB layers, and then keep on refining via the TACO layer. In addition, the bottom row of Fig.1(a) shows a different type of squeezing artifacts around the waist. To address it, we further propose that we first warp the clothes into a tucked-out shirts style, and then partially erase the texture from the warped clothes without hurting the smoothness of the appearance flows. Experimental results show that our SD-VITON successfully resolves both types of artifacts and outperforms the baseline methods. Source code will be available at https://github.com/SHShim0513/SD-VITON.
翻译:本文首先研究了近期高分辨率虚拟试穿方法中存在的一个视觉质量退化问题。实验发现,如图1(a)上排所示,衣物纹理在袖口处呈现压缩变形趋势。该问题的主要原因在于两种常用损失函数——全变分损失与对抗性损失——之间存在梯度冲突。具体而言,全变分损失旨在断开扭曲衣物掩模中袖口与躯干部分的边界,而对抗性损失则试图将两者融合。这些相互矛盾的目标会向级联表观流估计网络反馈不匹配的梯度,从而导致不良的压缩伪影。为减轻该问题,我们提出序贯变形方法(SD-VITON),将表观流预测层解耦为全变分主导层与任务共存层。具体而言,首先通过全变分主导层将衣物粗匹配至人体,再通过任务共存层进行精细化调整。此外,图1(a)下排展示了腰部出现的另一类压缩伪影。为解决该问题,我们进一步提出先通过腰部区域感知风格变形将衣物扭曲为束腰衬衫样式,再采用部分擦除策略,在不破坏表观流平滑性的前提下移除扭曲衣物中的纹理。实验结果表明,我们的SD-VITON方法成功消除了上述两类伪影,性能超越基线方法。源代码将在https://github.com/SHShim0513/SD-VITON 开源。