A crucial aspect of understanding the complex nature of Deep Neural Networks (DNNs) is the ability to explain learned concepts within their latent representations. While various methods exist to connect neurons to textual descriptions of human-understandable concepts, evaluating the quality of these explanation methods presents a major challenge in the field due to a lack of unified, general-purpose quantitative evaluation. In this work, we introduce CoSy (Concept Synthesis) -- a novel, architecture-agnostic framework to evaluate the quality of textual explanations for latent neurons. Given textual explanations, our proposed framework leverages a generative model conditioned on textual input to create data points representing the textual explanation. Then, the neuron's response to these explanation data points is compared with the response to control data points, providing a quality estimate of the given explanation. We ensure the reliability of our proposed framework in a series of meta-evaluation experiments and demonstrate practical value through insights from benchmarking various concept-based textual explanation methods for Computer Vision tasks, showing that tested explanation methods significantly differ in quality.
翻译:理解深度神经网络复杂性的一个关键方面在于能够解释其潜在表征中所学习到的概念。尽管存在多种方法将神经元与人类可理解概念的文本描述联系起来,但由于缺乏统一、通用的量化评估标准,如何评估这些解释方法的质量成为该领域的主要挑战。在本研究中,我们提出CoSy(概念合成)——一种新颖的、与架构无关的框架,用于评估潜在神经元文本解释的质量。给定文本解释,我们提出的框架利用基于文本输入条件化的生成模型来创建代表文本解释的数据点。随后,通过比较神经元对这些解释数据点与对照数据点的响应,从而对给定解释的质量进行量化评估。我们通过一系列元评估实验验证了所提出框架的可靠性,并通过对计算机视觉任务中多种基于概念的文本解释方法进行基准测试,展示了其实用价值。实验结果表明,所测试的解释方法在质量上存在显著差异。