With the rising concern on model interpretability, the application of eXplainable AI (XAI) tools on deepfake detection models has been a topic of interest recently. In image classification tasks, XAI tools highlight pixels influencing the decision given by a model. This helps in troubleshooting the model and determining areas that may require further tuning of parameters. With a wide range of tools available in the market, choosing the right tool for a model becomes necessary as each one may highlight different sets of pixels for a given image. There is a need to evaluate different tools and decide the best performing ones among them. Generic XAI evaluation methods like insertion or removal of salient pixels/segments are applicable for general image classification tasks but may produce less meaningful results when applied on deepfake detection models due to their functionality. In this paper, we perform experiments to show that generic removal/insertion XAI evaluation methods are not suitable for deepfake detection models. We also propose and implement an XAI evaluation approach specifically suited for deepfake detection models.
翻译:随着对模型可解释性的关注日益增加,将可解释人工智能工具应用于深度伪造检测模型已成为近期研究热点。在图像分类任务中,可解释人工智能工具会突出显示影响模型决策的像素,这有助于排查模型问题并确定需要进一步调整参数的领域。由于市场上存在多种可解释人工智能工具,针对特定模型选择合适工具至关重要,因为不同工具可能对同一图像突出显示不同的像素集。因此,需要评估不同工具并确定其中性能最优者。通用的可解释人工智能评估方法(如插入或移除显著像素/区域)适用于一般图像分类任务,但由于深度伪造检测模型的功能特性,这些方法可能产生意义不大的结果。本文通过实验证明,通用的移除/插入类可解释人工智能评估方法不适用于深度伪造检测模型,并据此提出并实现了一种专为深度伪造检测模型设计的可解释人工智能评估方法。