While DeepFake applications are becoming popular in recent years, their abuses pose a serious privacy threat. Unfortunately, most related detection algorithms to mitigate the abuse issues are inherently vulnerable to adversarial attacks because they are built atop DNN-based classification models, and the literature has demonstrated that they could be bypassed by introducing pixel-level perturbations. Though corresponding mitigation has been proposed, we have identified a new attribute-variation-based adversarial attack (AVA) that perturbs the latent space via a combination of Gaussian prior and semantic discriminator to bypass such mitigation. It perturbs the semantics in the attribute space of DeepFake images, which are inconspicuous to human beings (e.g., mouth open) but can result in substantial differences in DeepFake detection. We evaluate our proposed AVA attack on nine state-of-the-art DeepFake detection algorithms and applications. The empirical results demonstrate that AVA attack defeats the state-of-the-art black box attacks against DeepFake detectors and achieves more than a 95% success rate on two commercial DeepFake detectors. Moreover, our human study indicates that AVA-generated DeepFake images are often imperceptible to humans, which presents huge security and privacy concerns.
翻译:尽管深度伪造应用近年来日益流行,但其滥用对隐私安全构成严重威胁。不幸的是,大多数用于缓解滥用问题的相关检测算法本质上容易受到对抗攻击,因为它们构建于基于深度神经网络的分类模型之上,已有文献表明通过引入像素级扰动即可绕过这些检测器。虽然已有相应的缓解措施被提出,但我们发现了一种新的基于属性变异的对抗攻击(AVA),该攻击通过结合高斯先验与语义判别器扰动潜在空间,从而绕过此类防御机制。它通过扰动深度伪造图像在属性空间中的语义特征(例如张嘴等),这些扰动对人类而言不易察觉,却能在深度伪造检测中产生显著差异。我们在九种最先进的深度伪造检测算法与应用上评估了所提出的AVA攻击。实验结果表明,AVA攻击能够击败针对深度伪造检测器的现有最先进黑盒攻击,并在两种商业深度伪造检测器上实现了超过95%的成功率。此外,我们的人类研究表明,AVA生成的深度伪造图像往往被人类难以察觉,这带来了巨大的安全与隐私隐患。