This paper tackles text-guided control of StyleGAN for editing garments in full-body human images. Existing StyleGAN-based methods suffer from handling the rich diversity of garments and body shapes and poses. We propose a framework for text-guided full-body human image synthesis via an attention-based latent code mapper, which enables more disentangled control of StyleGAN than existing mappers. Our latent code mapper adopts an attention mechanism that adaptively manipulates individual latent codes on different StyleGAN layers under text guidance. In addition, we introduce feature-space masking at inference time to avoid unwanted changes caused by text inputs. Our quantitative and qualitative evaluations reveal that our method can control generated images more faithfully to given texts than existing methods.
翻译:本文探讨了利用文本引导控制StyleGAN,对全身人体图像中的服装进行编辑。现有基于StyleGAN的方法在处理服装、体形和姿态的丰富多样性方面存在局限。我们提出了一种文本引导的全身人体图像合成框架,通过基于注意力的潜码映射器,实现了比现有映射器更解耦的StyleGAN控制。该潜码映射器采用注意力机制,能在文本引导下自适应地操控StyleGAN不同层上的单个潜码。此外,我们还在推理阶段引入了特征空间掩码,以避免文本输入造成的不必要变化。定量和定性评估表明,与现有方法相比,我们的方法能更忠实于给定文本控制生成图像。