Deep learning has become the standard approach for most machine learning tasks. While its impact is undeniable, interpreting the predictions of deep learning models from a human perspective remains a challenge. In contrast to model training, model interpretability is harder to quantify and pose as an explicit optimization problem. Inspired by the AUC softmax information curve (AUC SIC) metric for evaluating feature attribution methods, we propose a unified discrete optimization framework for feature attribution and feature selection based on subset selection. This leads to a natural adaptive generalization of the path integrated gradients (PIG) method for feature attribution, which we call Greedy PIG. We demonstrate the success of Greedy PIG on a wide variety of tasks, including image feature attribution, graph compression/explanation, and post-hoc feature selection on tabular data. Our results show that introducing adaptivity is a powerful and versatile method for making attribution methods more powerful.
翻译:深度学习已成为大多数机器学习任务的标准方法。尽管其影响毋庸置疑,但从人类视角解释深度学习模型的预测仍是一项挑战。与模型训练不同,模型可解释性难以量化,也难以为其设立明确的优化问题。受用于评估特征归因方法的AUC软最大信息曲线(AUC SIC)指标启发,我们提出了一种基于子集选择的特征归因与特征选择的统一离散优化框架。这自然导出了路径积分梯度(PIG)方法的一种自适应泛化,我们称之为贪心PIG。我们在多种任务上验证了贪心PIG的成功,包括图像特征归因、图压缩/解释以及表格数据的后期特征选择。结果表明,引入自适应性是一种强大且通用的方法,能使归因方法更具效力。