Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Synthesizing visual content that meets users' needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which often lack flexibility, precision, and generality. In this work, we study a powerful yet much less explored way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner, as shown in Fig.1. To achieve this, we propose DragGAN, which consists of two main components: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative generator features to keep localizing the position of the handle points. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc. As these manipulations are performed on the learned generative image manifold of a GAN, they tend to produce realistic outputs even for challenging scenarios such as hallucinating occluded content and deforming shapes that consistently follow the object's rigidity. Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking. We also showcase the manipulation of real images through GAN inversion.

翻译：合成满足用户需求的视觉内容，通常需要对生成对象的姿态、形状、表情和布局具有灵活且精确的控制能力。现有方法通过人工标注的训练数据或先验三维模型来获得生成对抗网络（GANs）的可控性，但这些方法往往缺乏灵活性、精确性和通用性。在本工作中，我们研究了一种强大但尚未被充分探索的GAN控制方式，即通过用户交互的方式将图像中的任意点“拖拽”到精确的目标位置，如图1所示。为实现这一目标，我们提出了DragGAN，它包含两个主要组件：1）基于特征的运动监督，驱动控制点向目标位置移动；2）一种新的点追踪方法，利用判别性生成器特征持续定位控制点的位置。通过DragGAN，任何人都可以精确控制像素的位置来变形图像，从而操控动物、汽车、人物、风景等多种类别的姿态、形状、表情和布局。由于这些操控是在GAN习得的生成图像流形上执行的，即使在具有挑战性的场景（如幻觉遮挡内容、以及遵循物体刚体特性的形状变形）中也能生成逼真的输出。定性和定量比较均表明，DragGAN在图像操控和点追踪任务上优于现有方法。我们还通过GAN逆变换展示了真实图像的操控。