This work aims to address a novel Customized Virtual Try-ON (Cu-VTON) task, enabling the superimposition of a specified garment onto a model that can be customized in terms of appearance, posture, and additional attributes. Compared with traditional VTON task, it enables users to tailor digital avatars to their individual preferences, thereby enhancing the virtual fitting experience with greater flexibility and engagement. To address this task, we introduce a Neural Clothing Tryer (NCT) framework, which exploits the advanced diffusion models equipped with semantic enhancement and controlling modules to better preserve semantic characterization and textural details of the garment and meanwhile facilitating the flexible editing of the model's postures and appearances. Specifically, NCT introduces a semantic-enhanced module to take semantic descriptions of garments and utilizes a visual-language encoder to learn aligned features across modalities. The aligned features are served as condition input to the diffusion model to enhance the preservation of the garment's semantics. Then, a semantic controlling module is designed to take the garment image, tailored posture image, and semantic description as input to maintain garment details while simultaneously editing model postures, expressions, and various attributes. Extensive experiments on the open available benchmark demonstrate the superior performance of the proposed NCT framework.
翻译:本研究旨在解决一种新颖的定制化虚拟试穿任务,能够将指定服装叠加到可在外观、姿态及附加属性上进行自定义的模特模型上。与传统虚拟试穿任务相比,该方法使用户能根据个人偏好定制数字形象,从而以更高的灵活性和参与度提升虚拟试穿体验。为此,我们提出了神经服装试穿器框架,该框架利用配备语义增强与控制模块的先进扩散模型,在更好地保持服装语义特征与纹理细节的同时,实现对模特姿态与外观的灵活编辑。具体而言,NCT引入语义增强模块来获取服装的语义描述,并利用视觉-语言编码器学习跨模态的对齐特征。这些对齐特征作为扩散模型的条件输入,以增强服装语义的保持效果。随后,设计的语义控制模块以服装图像、定制姿态图像及语义描述作为输入,在保持服装细节的同时实现对模特姿态、表情及多种属性的编辑。在公开可用基准上的大量实验证明了所提NCT框架的卓越性能。