The quality of explanations for the predictions made by complex machine learning predictors is often measured using insertion and deletion metrics, which assess the faithfulness of the explanations, i.e., how accurately the explanations reflect the predictor's behavior. To improve the faithfulness, we propose insertion/deletion metric-aware explanation-based optimization (ID-ExpO), which optimizes differentiable predictors to improve both the insertion and deletion scores of the explanations while maintaining their predictive accuracy. Because the original insertion and deletion metrics are non-differentiable with respect to the explanations and directly unavailable for gradient-based optimization, we extend the metrics so that they are differentiable and use them to formalize insertion and deletion metric-based regularizers. Our experimental results on image and tabular datasets show that the deep neural network-based predictors that are fine-tuned using ID-ExpO enable popular post-hoc explainers to produce more faithful and easier-to-interpret explanations while maintaining high predictive accuracy. The code is available at https://github.com/yuyay/idexpo.
翻译:机器学习复杂预测模型决策解释的质量通常通过插入与删除度量来评估,这些度量衡量解释的忠实度,即解释反映预测器行为的准确程度。为提高解释忠实度,我们提出基于插入/删除度量感知的解释优化方法(ID-ExpO),该方法通过优化可微预测器,在保持预测精度的同时提升解释的插入与删除得分。由于原始插入/删除度量对解释不可微,无法直接用于梯度优化,我们对度量进行扩展使其可微,并据此构建插入/删除度量正则项。在图像与表格数据集上的实验结果表明,经ID-ExpO微调的深度神经网络预测器,能够使主流事后解释方法生成更忠实且更易解读的解释,同时保持高预测精度。代码已开源至https://github.com/yuyay/idexpo。