In this paper we propose a new framework for evaluating the performance of explanation methods on the decisions of a deepfake detector. This framework assesses the ability of an explanation method to spot the regions of a fake image with the biggest influence on the decision of the deepfake detector, by examining the extent to which these regions can be modified through a set of adversarial attacks, in order to flip the detector's prediction or reduce its initial prediction; we anticipate a larger drop in deepfake detection accuracy and prediction, for methods that spot these regions more accurately. Based on this framework, we conduct a comparative study using a state-of-the-art model for deepfake detection that has been trained on the FaceForensics++ dataset, and five explanation methods from the literature. The findings of our quantitative and qualitative evaluations document the advanced performance of the LIME explanation method against the other compared ones, and indicate this method as the most appropriate for explaining the decisions of the utilized deepfake detector.
翻译:本文提出了一种新框架,用于评估解释方法在深度伪造检测器决策中的性能。该框架通过检测一组对抗攻击对这些区域的可修改程度,评估解释方法识别伪造图像中对检测器决策影响最大区域的能力,从而翻转检测器的预测或降低其初始预测值。我们预期,对于能更精确识别这些区域的方法,深度伪造检测准确率和预测值的下降幅度更大。基于该框架,我们利用在FaceForensics++数据集上训练的、用于深度伪造检测的最先进模型,以及文献中的五种解释方法进行了比较研究。定量与定性评估结果表明,LIME解释方法相较于其他方法展现出更优性能,并指出该方法最适合解释所用深度伪造检测器的决策过程。