This paper tackles text-guided control of StyleGAN for editing garments in full-body human images. Existing StyleGAN-based methods suffer from handling the rich diversity of garments and body shapes and poses. We propose a framework for text-guided full-body human image synthesis via an attention-based latent code mapper, which enables more disentangled control of StyleGAN than existing mappers. Our latent code mapper adopts an attention mechanism that adaptively manipulates individual latent codes on different StyleGAN layers under text guidance. In addition, we introduce feature-space masking at inference time to avoid unwanted changes caused by text inputs. Our quantitative and qualitative evaluations reveal that our method can control generated images more faithfully to given texts than existing methods.
翻译:本文探讨了在全身人体图像中通过文本引导控制StyleGAN进行服装编辑的问题。现有基于StyleGAN的方法难以处理服装、体型和姿势的丰富多样性。我们提出了一种基于注意力机制的隐编码映射器框架,用于文本引导的全身人体图像合成,相比现有映射器能够实现对StyleGAN更解耦的控制。该隐编码映射器采用注意力机制,可在文本引导下自适应地操控StyleGAN不同层级上的单个隐编码。此外,我们引入了推理时的特征空间掩码策略,以避免文本输入导致的不必要变化。定量与定性评估表明,与现有方法相比,我们的方法能更忠实地根据给定文本控制生成图像。