We present a novel approach to detecting noun abstraction within a large language model (LLM). Starting from a psychologically motivated set of noun pairs in taxonomic relationships, we instantiate surface patterns indicating hypernymy and analyze the attention matrices produced by BERT. We compare the results to two sets of counterfactuals and show that we can detect hypernymy in the abstraction mechanism, which cannot solely be related to the distributional similarity of noun pairs. Our findings are a first step towards the explainability of conceptual abstraction in LLMs.
翻译:我们提出了一种检测大型语言模型(LLM)中名词抽象程度的新方法。基于心理学启发的分类关系名词对集合,我们实例化了表示上下位关系的表层模式,并分析了BERT生成的注意力矩阵。通过将结果与两组反事实对照进行比较,我们证明能够检测出抽象机制中的上下位关系,而这种关系不能仅归因于名词对的分布相似性。我们的研究为解释LLM中的概念抽象能力迈出了第一步。