Model Interpretation aims at the extraction of insights from the internals of a trained model. A common approach to address this task is the characterization of relevant features internally encoded in the model that are critical for its proper operation. Despite recent progress of these methods, they come with the weakness of being computationally expensive due to the dense evaluation of datasets that they require. As a consequence, research on the design of these methods have focused on smaller data subsets which may led to reduced insights. To address these computational costs, we propose a coreset-based interpretation framework that utilizes coreset selection methods to sample a representative subset of the large dataset for the interpretation task. Towards this goal, we propose a similarity-based evaluation protocol to assess the robustness of model interpretation methods towards the amount data they take as input. Experiments considering several interpretation methods, DNN models, and coreset selection methods show the effectiveness of the proposed framework.
翻译:模型解释旨在从训练模型的内部提取洞见。解决此任务的常见方法是表征模型内部编码的相关特征,这些特征对其正常运行至关重要。尽管这些方法近期取得了进展,但它们存在计算成本高的弱点,这源于其所需的数据集密集评估。因此,这些方法的设计研究集中于较小的数据子集,这可能导致洞见减少。为应对这些计算成本,我们提出了一种基于核心集的解释框架,该框架利用核心集选择方法从大型数据集中采样一个代表性子集用于解释任务。为实现这一目标,我们提出了一种基于相似性的评估协议,以评估模型解释方法对其输入数据量的鲁棒性。考虑多种解释方法、深度神经网络模型和核心集选择方法的实验表明了所提框架的有效性。