Generative adversarial networks (GANs) have attained photo-realistic quality in image generation. However, how to best control the image content remains an open challenge. We introduce LatentKeypointGAN, a two-stage GAN which is trained end-to-end on the classical GAN objective with internal conditioning on a set of space keypoints. These keypoints have associated appearance embeddings that respectively control the position and style of the generated objects and their parts. A major difficulty that we address with suitable network architectures and training schemes is disentangling the image into spatial and appearance factors without domain knowledge and supervision signals. We demonstrate that LatentKeypointGAN provides an interpretable latent space that can be used to re-arrange the generated images by re-positioning and exchanging keypoint embeddings, such as generating portraits by combining the eyes, nose, and mouth from different images. In addition, the explicit generation of keypoints and matching images enables a new, GAN-based method for unsupervised keypoint detection.
翻译:生成对抗网络(GAN)在图像生成方面已实现照片级逼真质量。然而,如何最优地控制图像内容仍是一个开放挑战。我们提出LatentKeypointGAN,这是一种两阶段GAN,以经典GAN目标函数进行端到端训练,并通过对一组空间关键点进行内部条件约束。这些关键点关联有外观嵌入,分别控制生成对象及其组成部分的位置与风格。我们通过合适的网络架构与训练方案解决的核心难题是:在无领域知识与监督信号的情况下,将图像解耦为空间因子与外观因子。实验表明,LatentKeypointGAN提供了可解释的潜在空间,可通过重新定位和交换关键点嵌入来重组生成图像(例如结合不同图像的眼睛、鼻子和嘴巴生成肖像画)。此外,关键点及其匹配图像的显式生成为无监督关键点检测提供了一种基于GAN的新方法。