The ability of generative models to produce highly realistic synthetic face images has raised security and ethical concerns. As a first line of defense against such fake faces, deep learning based forensic classifiers have been developed. While these forensic models can detect whether a face image is synthetic or real with high accuracy, they are also vulnerable to adversarial attacks. Although such attacks can be highly successful in evading detection by forensic classifiers, they introduce visible noise patterns that are detectable through careful human scrutiny. Additionally, these attacks assume access to the target model(s) which may not always be true. Attempts have been made to directly perturb the latent space of GANs to produce adversarial fake faces that can circumvent forensic classifiers. In this work, we go one step further and show that it is possible to successfully generate adversarial fake faces with a specified set of attributes (e.g., hair color, eye size, race, gender, etc.). To achieve this goal, we leverage the state-of-the-art generative model StyleGAN with disentangled representations, which enables a range of modifications without leaving the manifold of natural images. We propose a framework to search for adversarial latent codes within the feature space of StyleGAN, where the search can be guided either by a text prompt or a reference image. We also propose a meta-learning based optimization strategy to achieve transferable performance on unknown target models. Extensive experiments demonstrate that the proposed approach can produce semantically manipulated adversarial fake faces, which are true to the specified attribute set and can successfully fool forensic face classifiers, while remaining undetectable by humans. Code: https://github.com/koushiksrivats/face_attribute_attack.
翻译:生成模型生成高度逼真合成人脸图像的能力引发了安全与伦理担忧。作为针对此类虚假人像的第一道防线,基于深度学习的取证分类器已被开发出来。尽管这些取证模型能以高准确率检测人脸图像是合成还是真实,但它们同样容易受到对抗攻击。虽然这类攻击在规避取证分类器检测方面效果显著,但会引入可通过人工仔细审查检测到的可见噪声模式。此外,这些攻击假设能够访问目标模型,而这并不总是成立。已有研究尝试直接扰动生成对抗网络(GAN)的潜空间,以生成能绕过取证分类器的对抗性虚假人脸。本研究更进一步,证明可以成功生成具有指定属性集(如发色、眼睛大小、种族、性别等)的对抗性虚假人脸。为实现此目标,我们利用具有解缠表示能力的先进生成模型StyleGAN,该模型可在不偏离自然图像流形的前提下实现多种修改。我们提出一个框架,在StyleGAN的特征空间中搜索对抗性潜编码,该搜索可由文本提示或参考图像引导。我们还提出一种基于元学习的优化策略,以实现对未知目标模型的可迁移性能。大量实验表明,所提方法能生成语义操控的对抗性虚假人脸,这些图像不仅符合指定属性集,还能成功欺骗取证人脸分类器,同时保持人类不可察觉性。代码:https://github.com/koushiksrivats/face_attribute_attack。