Concept-based interpretability methods aim to explain deep neural network model predictions using a predefined set of semantic concepts. These methods evaluate a trained model on a new, "probe" dataset and correlate model predictions with the visual concepts labeled in that dataset. Despite their popularity, they suffer from limitations that are not well-understood and articulated by the literature. In this work, we analyze three commonly overlooked factors in concept-based explanations. First, the choice of the probe dataset has a profound impact on the generated explanations. Our analysis reveals that different probe datasets may lead to very different explanations, and suggests that the explanations are not generalizable outside the probe dataset. Second, we find that concepts in the probe dataset are often less salient and harder to learn than the classes they claim to explain, calling into question the correctness of the explanations. We argue that only visually salient concepts should be used in concept-based explanations. Finally, while existing methods use hundreds or even thousands of concepts, our human studies reveal a much stricter upper bound of 32 concepts or less, beyond which the explanations are much less practically useful. We make suggestions for future development and analysis of concept-based interpretability methods. Code for our analysis and user interface can be found at \url{https://github.com/princetonvisualai/OverlookedFactors}
翻译:基于概念的可解释性方法旨在利用预定义的语义概念集合来解释深度神经网络模型的预测结果。这些方法在一个新的“探测”数据集上评估训练后的模型,并将模型预测与该数据集中标注的视觉概念相关联。尽管此类方法应用广泛,但它们仍存在文献尚未充分认识和阐述的局限性。本研究分析了基于概念的解释中三个常被忽视的因素。首先,探测数据集的选择对生成的解释具有深远影响。我们的分析表明,不同的探测数据集可能产生截然不同的解释,并指出这些解释在探测数据集之外不具备泛化性。其次,我们发现探测数据集中的概念往往不如其所声称解释的类别显著且难以学习,这对其解释的正确性提出了质疑。我们主张只有视觉显著的概念才应被用于基于概念的解释。最后,尽管现有方法使用数百甚至数千个概念,但我们的用户研究揭示了更为严格的上限——不超过32个概念,超出此范围后解释的实际有用性将大幅降低。我们为未来基于概念可解释性方法的发展与分析提出了建议。本研究的分析代码与用户界面可访问 \url{https://github.com/princetonvisualai/OverlookedFactors}。