In recent years, concept-based approaches have emerged as some of the most promising explainability methods to help us interpret the decisions of Artificial Neural Networks (ANNs). These methods seek to discover intelligible visual 'concepts' buried within the complex patterns of ANN activations in two key steps: (1) concept extraction followed by (2) importance estimation. While these two steps are shared across methods, they all differ in their specific implementations. Here, we introduce a unifying theoretical framework that comprehensively defines and clarifies these two steps. This framework offers several advantages as it allows us: (i) to propose new evaluation metrics for comparing different concept extraction approaches; (ii) to leverage modern attribution methods and evaluation metrics to extend and systematically evaluate state-of-the-art concept-based approaches and importance estimation techniques; (iii) to derive theoretical guarantees regarding the optimality of such methods. We further leverage our framework to try to tackle a crucial question in explainability: how to efficiently identify clusters of data points that are classified based on a similar shared strategy. To illustrate these findings and to highlight the main strategies of a model, we introduce a visual representation called the strategic cluster graph. Finally, we present https://serre-lab.github.io/Lens, a dedicated website that offers a complete compilation of these visualizations for all classes of the ImageNet dataset.
翻译:近年来,基于概念的方法已成为解释人工神经网络(ANN)决策的最有前景的可解释性方法之一。这些方法旨在通过两个关键步骤发现隐藏在ANN激活复杂模式中的可理解视觉"概念":(1)概念提取,随后(2)重要性估计。尽管这两个步骤在不同方法中普遍存在,但它们在具体实现上各不相同。在此,我们引入了一个统一的理论框架,全面定义并阐明了这两个步骤。该框架具有多项优势:它使我们能够(i)提出新的评估指标以比较不同的概念提取方法;(ii)利用现代归因方法和评估指标来扩展并系统评估最先进的基于概念的方法和重要性估计技术;(iii)推导出关于此类方法最优性的理论保证。我们进一步利用该框架试图解决可解释性中的一个关键问题:如何高效识别基于相似共享策略分类的数据点聚类。为说明这些发现并突出模型的主要策略,我们引入了一种称为战略聚类图的可视化表示。最后,我们展示了https://serre-lab.github.io/Lens,一个专门提供ImageNet数据集所有类别完整可视化汇编的网站。