Semantic editing of images is the fundamental goal of computer vision. Although deep learning methods, such as generative adversarial networks (GANs), are capable of producing high-quality images, they often do not have an inherent way of editing generated images semantically. Recent studies have investigated a way of manipulating the latent variable to determine the images to be generated. However, methods that assume linear semantic arithmetic have certain limitations in terms of the quality of image editing, whereas methods that discover nonlinear semantic pathways provide non-commutative editing, which is inconsistent when applied in different orders. This study proposes a novel method called deep curvilinear editing (DeCurvEd) to determine semantic commuting vector fields on the latent space. We theoretically demonstrate that owing to commutativity, the editing of multiple attributes depends only on the quantities and not on the order. Furthermore, we experimentally demonstrate that compared to previous methods, the nonlinear and commutative nature of DeCurvEd facilitates the disentanglement of image attributes and provides higher-quality editing.
翻译:图像语义编辑是计算机视觉的基本目标。尽管生成对抗网络等深度学习方法能够生成高质量图像,但这些方法通常不具备对生成图像进行语义编辑的内在机制。近期研究探索了通过操控潜变量来决定生成图像的方式。然而,假设线性语义算术的方法在图像编辑质量上存在一定局限性,而发现非线性语义通路的方法则会产生非交换性编辑,这在不同编辑顺序下会导致不一致的结果。本研究提出了一种名为深度曲线编辑(DeCurvEd)的新方法,用于确定潜空间上的语义交换向量场。我们从理论上证明了由于交换性,多属性编辑的效果仅取决于编辑量而非编辑顺序。此外,实验结果表明,相较于现有方法,DeCurvEd的非线性与交换性特性能够更好地解耦图像属性,并提供更高质量的编辑效果。