Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods

Language Models (LMs) acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model's inner workings and further for updating or correcting this embedded knowledge without the significant cost of retraining. This underscores the importance of unveiling exactly what knowledge is stored and its association with specific model components. Instance Attribution (IA) and Neuron Attribution (NA) offer insights into this training-acquired knowledge, though they have not been compared systematically. Our study introduces a novel evaluation framework to quantify and compare the knowledge revealed by IA and NA. To align the results of the methods we introduce the attribution method NA-Instances to apply NA for retrieving influential training instances, and IA-Neurons to discover important neurons of influential instances discovered by IA. We further propose a comprehensive list of faithfulness tests to evaluate the comprehensiveness and sufficiency of the explanations provided by both methods. Through extensive experiments and analysis, we demonstrate that NA generally reveals more diverse and comprehensive information regarding the LM's parametric knowledge compared to IA. Nevertheless, IA provides unique and valuable insights into the LM's parametric knowledge, which are not revealed by NA. Our findings further suggest the potential of a synergistic approach of combining the diverse findings of IA and NA for a more holistic understanding of an LM's parametric knowledge.

翻译：语言模型（LM）从其训练过程中获取参数化知识，并将其嵌入到模型权重中。然而，LM日益增强的可扩展性给理解模型内部运作、进而以无需昂贵重训练成本的方式更新或修正这些嵌入知识带来了重大挑战。这凸显了揭示模型究竟存储了哪些知识及其与特定模型组件关联的重要性。实例归因（IA）和神经元归因（NA）为这些训练获取的知识提供了洞见，但尚未被系统地比较。本研究引入了一个新颖的评估框架，用于量化并比较IA和NA所揭示的知识。为使其结果对齐，我们引入归因方法NA-Instances，利用NA来检索有影响力的训练实例，以及IA-Neurons，用于发现由IA识别出的有影响力实例中的重要神经元。我们进一步提出了一套全面的忠实性测试，以评估两种方法提供的解释的全面性和充分性。通过大量实验和分析，我们证明：与IA相比，NA通常能揭示更丰富、更全面的关于LM参数化知识的信息。尽管如此，IA也为LM的参数化知识提供了独特且有价值的洞见，而这是NA无法揭示的。我们的研究进一步表明，将IA和NA的多样化发现进行协同整合，有望更全面地理解LM的参数化知识。