We present ConceptEvo, a unified interpretation framework for deep neural networks (DNNs) that reveals the inception and evolution of learned concepts during training. Our work addresses a critical gap in DNN interpretation research, as existing methods primarily focus on post-training interpretation. ConceptEvo introduces two novel technical contributions: (1) an algorithm that generates a unified semantic space, enabling side-by-side comparison of different models during training, and (2) an algorithm that discovers and quantifies important concept evolutions for class predictions. Through a large-scale human evaluation and quantitative experiments, we demonstrate that ConceptEvo successfully identifies concept evolutions across different models, which are not only comprehensible to humans but also crucial for class predictions. ConceptEvo is applicable to both modern DNN architectures, such as ConvNeXt, and classic DNNs, such as VGGs and InceptionV3.
翻译:我们提出ConceptEvo——一个针对深度神经网络(DNNs)的统一解释框架,该框架能揭示训练过程中学习概念的涌现与演化。我们的工作填补了DNN解释研究中的关键空白,因为现有方法主要关注训练后的解释。ConceptEvo包含两项新颖的技术贡献:(1) 一种生成统一语义空间的算法,能够实现训练过程中不同模型的并排比较;(2) 一种发现并量化类别预测中重要概念演化的算法。通过大规模人工评估和定量实验,我们证明ConceptEvo成功识别了不同模型间的概念演化,这些概念不仅可被人类理解,而且对类别预测至关重要。ConceptEvo既可应用于现代DNN架构(如ConvNeXt),也可应用于经典DNN(如VGGs和InceptionV3)。