Evading Forensic Classifiers with Attribute-Conditioned Adversarial Faces

The ability of generative models to produce highly realistic synthetic face images has raised security and ethical concerns. As a first line of defense against such fake faces, deep learning based forensic classifiers have been developed. While these forensic models can detect whether a face image is synthetic or real with high accuracy, they are also vulnerable to adversarial attacks. Although such attacks can be highly successful in evading detection by forensic classifiers, they introduce visible noise patterns that are detectable through careful human scrutiny. Additionally, these attacks assume access to the target model(s) which may not always be true. Attempts have been made to directly perturb the latent space of GANs to produce adversarial fake faces that can circumvent forensic classifiers. In this work, we go one step further and show that it is possible to successfully generate adversarial fake faces with a specified set of attributes (e.g., hair color, eye size, race, gender, etc.). To achieve this goal, we leverage the state-of-the-art generative model StyleGAN with disentangled representations, which enables a range of modifications without leaving the manifold of natural images. We propose a framework to search for adversarial latent codes within the feature space of StyleGAN, where the search can be guided either by a text prompt or a reference image. We also propose a meta-learning based optimization strategy to achieve transferable performance on unknown target models. Extensive experiments demonstrate that the proposed approach can produce semantically manipulated adversarial fake faces, which are true to the specified attribute set and can successfully fool forensic face classifiers, while remaining undetectable by humans. Code: https://github.com/koushiksrivats/face_attribute_attack.

翻译：生成模型生成高度逼真合成人脸图像的能力引发了安全与伦理担忧。作为针对此类虚假人像的第一道防线，基于深度学习的取证分类器已被开发出来。尽管这些取证模型能以高准确率检测人脸图像是合成还是真实，但它们同样容易受到对抗攻击。虽然这类攻击在规避取证分类器检测方面效果显著，但会引入可通过人工仔细审查检测到的可见噪声模式。此外，这些攻击假设能够访问目标模型，而这并不总是成立。已有研究尝试直接扰动生成对抗网络（GAN）的潜空间，以生成能绕过取证分类器的对抗性虚假人脸。本研究更进一步，证明可以成功生成具有指定属性集（如发色、眼睛大小、种族、性别等）的对抗性虚假人脸。为实现此目标，我们利用具有解缠表示能力的先进生成模型StyleGAN，该模型可在不偏离自然图像流形的前提下实现多种修改。我们提出一个框架，在StyleGAN的特征空间中搜索对抗性潜编码，该搜索可由文本提示或参考图像引导。我们还提出一种基于元学习的优化策略，以实现对未知目标模型的可迁移性能。大量实验表明，所提方法能生成语义操控的对抗性虚假人脸，这些图像不仅符合指定属性集，还能成功欺骗取证人脸分类器，同时保持人类不可察觉性。代码：https://github.com/koushiksrivats/face_attribute_attack。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日