We present a novel approach to detecting noun abstraction within a large language model (LLM). Starting from a psychologically motivated set of noun pairs in taxonomic relationships, we instantiate surface patterns indicating hypernymy and analyze the attention matrices produced by BERT. We compare the results to two sets of counterfactuals and show that we can detect hypernymy in the abstraction mechanism, which cannot solely be related to the distributional similarity of noun pairs. Our findings are a first step towards the explainability of conceptual abstraction in LLMs.
翻译:我们提出了一种在大语言模型(LLM)中检测名词抽象性的新方法。从一组具有心理学动机、处于分类关系中的名词对出发,我们实例化用于指示上下位关系的表层模式,并分析BERT生成的注意力矩阵。我们将结果与两组反事实案例进行比较,表明我们能够在抽象机制中检测到上下位关系,而这种关系不能仅归因于名词对的分布相似性。我们的发现是迈向解释LLM中概念抽象能力的第一步。