A crucial aspect of understanding the complex nature of Deep Neural Networks (DNNs) is the ability to explain learned concepts within their latent representations. While methods exist to connect neurons to human-understandable textual descriptions, evaluating the quality of these explanations is challenging due to the lack of a unified quantitative approach. We introduce CoSy (Concept Synthesis), a novel, architecture-agnostic framework for evaluating textual explanations of latent neurons. Given textual explanations, our proposed framework uses a generative model conditioned on textual input to create data points representing the explanations. By comparing the neuron's response to these generated data points and control data points, we can estimate the quality of the explanation. We validate our framework through sanity checks and benchmark various neuron description methods for Computer Vision tasks, revealing significant differences in quality.
翻译:理解深度神经网络(DNNs)复杂本质的一个关键方面,是能够解释其潜在表征中所学习到的概念。虽然存在将神经元与人类可理解的文本描述联系起来的方法,但由于缺乏统一的量化评估手段,评估这些解释的质量具有挑战性。我们提出了CoSy(概念合成),一个新颖的、与架构无关的框架,用于评估潜在神经元的文本解释。给定文本解释,我们提出的框架使用一个以文本输入为条件的生成模型,来创建代表这些解释的数据点。通过比较神经元对这些生成数据点和控制数据点的响应,我们可以评估解释的质量。我们通过合理性检验验证了我们的框架,并对计算机视觉任务中的多种神经元描述方法进行了基准测试,揭示了它们在质量上的显著差异。