Discourse Entity (DE) recognition is the task of identifying novel and known entities introduced within a text. While previous work has found that large language models have basic, if imperfect, DE recognition abilities (Schuster and Linzen, 2022), it remains largely unassessed which of the fundamental semantic properties that govern the introduction and subsequent reference to DEs they have knowledge of. We propose the Linguistically-Informed Evaluation for Discourse Entity Recognition (LIEDER) dataset that allows for a detailed examination of language models' knowledge of four crucial semantic properties: existence, uniqueness, plurality, and novelty. We find evidence that state-of-the-art large language models exhibit sensitivity to all of these properties except novelty, which demonstrates that they have yet to reach human-level language understanding abilities.
翻译:话语实体识别是指识别文本中新引入及已知实体的任务。尽管已有研究发现大型语言模型具备基础但非完美的话语实体识别能力(Schuster and Linzen, 2022),但模型对控制话语实体引入及后续指代的基本语义属性究竟掌握多少,目前仍缺乏系统评估。我们提出了语言学指导的话语实体识别评估数据集(LIEDER),该数据集可细致检验语言模型对四种关键语义属性——存在性、唯一性、复数性和新颖性——的掌握程度。实验证据表明,最先进的大型语言模型对除新颖性外的所有属性均表现出敏感性,这揭示了它们尚未达到人类级别的语言理解能力。