We introduce a novel methodology for identifying adversarial attacks on deepfake detectors using eXplainable Artificial Intelligence (XAI). In an era characterized by digital advancement, deepfakes have emerged as a potent tool, creating a demand for efficient detection systems. However, these systems are frequently targeted by adversarial attacks that inhibit their performance. We address this gap, developing a defensible deepfake detector by leveraging the power of XAI. The proposed methodology uses XAI to generate interpretability maps for a given method, providing explicit visualizations of decision-making factors within the AI models. We subsequently employ a pretrained feature extractor that processes both the input image and its corresponding XAI image. The feature embeddings extracted from this process are then used for training a simple yet effective classifier. Our approach contributes not only to the detection of deepfakes but also enhances the understanding of possible adversarial attacks, pinpointing potential vulnerabilities. Furthermore, this approach does not change the performance of the deepfake detector. The paper demonstrates promising results suggesting a potential pathway for future deepfake detection mechanisms. We believe this study will serve as a valuable contribution to the community, sparking much-needed discourse on safeguarding deepfake detectors.
翻译:我们提出了一种利用可解释人工智能(XAI)识别深度伪造检测器对抗攻击的新方法。在数字化高速发展的时代,深度伪造已成为一种强有力的工具,催生了高效检测系统的需求。然而,这类系统频繁受到抑制其性能的对抗攻击。针对这一研究空白,我们通过利用XAI的强大能力,开发了一种具备防御能力的深度伪造检测器。所提出的方法利用XAI为给定方法生成可解释性映射图,直观展示AI模型中的决策因子。随后,我们采用预训练特征提取器同时处理输入图像及其对应的XAI图像。从该过程中提取的特征嵌入被用于训练一个简单而高效的分类器。我们的方法不仅有助于深度伪造检测,还能增强对潜在对抗攻击的理解,精准定位可能存在的脆弱点。此外,该方法不会改变深度伪造检测器的原有性能。实验结果表明,该技术为未来深度伪造检测机制提供了可行路径。我们相信这项研究将为学术界作出重要贡献,并引发关于保护深度伪造检测器的必要讨论。