Explain To Me: Salience-Based Explainability for Synthetic Face Detection Models

The performance of convolutional neural networks has continued to improve over the last decade. At the same time, as model complexity grows, it becomes increasingly more difficult to explain model decisions. Such explanations may be of critical importance for reliable operation of human-machine pairing setups, or for model selection when the "best" model among many equally-accurate models must be established. Saliency maps represent one popular way of explaining model decisions by highlighting image regions models deem important when making a prediction. However, examining salience maps at scale is not practical. In this paper, we propose five novel methods of leveraging model salience to explain a model behavior at scale. These methods ask: (a) what is the average entropy for a model's salience maps, (b) how does model salience change when fed out-of-set samples, (c) how closely does model salience follow geometrical transformations, (d) what is the stability of model salience across independent training runs, and (e) how does model salience react to salience-guided image degradations. To assess the proposed measures on a concrete and topical problem, we conducted a series of experiments for the task of synthetic face detection with two types of models: those trained traditionally with cross-entropy loss, and those guided by human salience when training to increase model generalizability. These two types of models are characterized by different, interpretable properties of their salience maps, which allows for the evaluation of the correctness of the proposed measures. We offer source codes for each measure along with this paper.

翻译：卷积神经网络的性能在过去十年中持续提升。然而，随着模型复杂度的增加，解释模型决策的难度也随之增大。此类解释对于人机协同设置中的可靠运行，或在众多精度相当的模型中确定“最佳”模型时的模型选择而言，可能具有关键意义。显著性图通过突出显示模型在预测时认为重要的图像区域，成为解释模型决策的一种流行方法。然而，大规模检查显著性图并不实际。本文提出了五种利用模型显著性来规模化解释模型行为的新方法。这些方法探究：（a）模型显著性图的平均熵是多少；（b）当输入集外样本时，模型显著性如何变化；（c）模型显著性对几何变换的跟随程度如何；（d）在不同独立训练轮次间，模型显著性的稳定性如何；以及（e）模型显著性对显著性引导的图像退化如何响应。为了在一个具体且具有现实意义的问题上评估所提出的度量方法，我们针对合成人脸检测任务开展了一系列实验：采用两类模型，一类基于传统的交叉熵损失训练，另一类在训练过程中引入人类显著性引导以增强模型泛化能力。这两类模型因其显著性图具有不同且可解释的特性而得以区分，从而允许对所提度量方法的正确性进行评估。本文随附每种度量方法的源代码。