Post-hoc interpretability methods are critical tools to explain neural-network results. Several post-hoc methods have emerged in recent years, but when applied to a given task, they produce different results, raising the question of which method is the most suitable to provide correct post-hoc interpretability. To understand the performance of each method, quantitative evaluation of interpretability methods is essential. However, currently available frameworks have several drawbacks which hinders the adoption of post-hoc interpretability methods, especially in high-risk sectors. In this work, we propose a framework with quantitative metrics to assess the performance of existing post-hoc interpretability methods in particular in time series classification. We show that several drawbacks identified in the literature are addressed, namely dependence on human judgement, retraining, and shift in the data distribution when occluding samples. We additionally design a synthetic dataset with known discriminative features and tunable complexity. The proposed methodology and quantitative metrics can be used to understand the reliability of interpretability methods results obtained in practical applications. In turn, they can be embedded within operational workflows in critical fields that require accurate interpretability results for e.g., regulatory policies.
翻译:事后可解释性方法是解释神经网络结果的关键工具。近年来出现了多种事后方法,但当应用于特定任务时,它们会产生不同的结果,这引发了哪种方法最适合提供正确事后可解释性的问题。为了理解每种方法的性能,对可解释性方法进行定量评估至关重要。然而,现有框架存在若干缺陷,阻碍了事后可解释性方法的采用,尤其是在高风险领域。在本工作中,我们提出了一个包含定量指标的框架,用于评估现有事后可解释性方法的性能,特别是在时间序列分类中的应用。我们表明,文献中已指出的若干缺陷得到了解决,即对人类判断的依赖、重新训练以及遮挡样本时数据分布的偏移。此外,我们设计了一个具有已知判别特征和可调复杂度的合成数据集。所提出的方法和定量指标可用于理解实际应用中所得可解释性方法结果的可靠性。进而,它们可被嵌入到关键领域的操作工作流中,这些领域需要准确的可解释性结果以支持例如监管政策的制定。