Image editing has been a long-standing challenge in the research community with its far-reaching impact on numerous applications. Recently, text-driven methods started to deliver promising results in domains like human faces, but their applications to more complex domains have been relatively limited. In this work, we explore the task of fashion style editing, where we aim to manipulate the fashion style of human imagery using text descriptions. Specifically, we leverage a generative human prior and achieve fashion style editing by navigating its learned latent space. We first verify that the existing text-driven editing methods fall short for our problem due to their overly simplified guidance signal, and propose two directions to reinforce the guidance: textual augmentation and visual referencing. Combined with our empirical findings on the latent space structure, our Fashion Style Editing framework (FaSE) successfully projects abstract fashion concepts onto human images and introduces exciting new applications to the field.
翻译:图像编辑一直是研究领域中的长期挑战,其对众多应用具有深远影响。近年来,文本驱动方法在人脸等领域开始取得令人瞩目的成果,但在更复杂领域的应用仍相对有限。本文探索时尚风格编辑任务,旨在通过文本描述操控人物图像的时尚风格。具体而言,我们利用生成式人体先验,通过导航其学习到的潜在空间来实现时尚风格编辑。我们首先验证了现有文本驱动编辑方法因引导信号过于简化而无法解决本问题,并提出两个方向以强化引导:文本增强与视觉参考。结合对潜在空间结构的经验性发现,我们的时尚风格编辑框架(FaSE)成功将抽象时尚概念投射到人物图像上,并为该领域带来了令人振奋的新应用。