This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning) by distinguishing two LLM evaluation paradigms: psycholinguistic and neurolinguistic. Traditional psycholinguistic evaluations often reflect statistical biases that may misrepresent LLMs' true linguistic capabilities. We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers. This method allows for a detailed examination of how LLMs represent form and meaning, and whether these representations are consistent across languages. Our contributions are three-fold: (1) We compare neurolinguistic and psycholinguistic methods, revealing distinct patterns in LLM assessment; (2) We demonstrate that LLMs exhibit higher competence in form compared to meaning, with the latter largely correlated to the former; (3) We present new conceptual minimal pair datasets for Chinese (COMPS-ZH) and German (COMPS-DE), complementing existing English datasets.
翻译:本研究通过区分两种大型语言模型(LLMs)评估范式——心理语言学范式与神经语言学范式,探究LLMs对能指(形式)与所指(意义)的语言理解能力。传统的心理语言学评估常反映出统计偏差,可能误判LLMs的真实语言能力。我们引入一种神经语言学方法,采用结合最小对比对与诊断探针的新技术,分析模型各层的激活模式。该方法能细致考察LLMs如何表征形式与意义,以及这些表征在不同语言间是否具有一致性。我们的贡献包括三个方面:(1)通过比较神经语言学与心理语言学方法,揭示了LLM评估中的差异化模式;(2)证明LLMs在形式表征方面比意义表征表现出更强能力,且后者在很大程度上依赖于前者;(3)构建了中文(COMPS-ZH)与德语(COMPS-DE)的新型概念最小对比对数据集,对现有英语数据集形成补充。