Natural Language Processing prides itself to be an empirically-minded, if not outright empiricist field, and yet lately it seems to get itself into essentialist debates on issues of meaning and measurement ("Do Large Language Models Understand Language, And If So, How Much?"). This is not by accident: Here, as everywhere, the evidence underspecifies the understanding. As a remedy, this paper sketches the outlines of a model of understanding, which can ground questions of the adequacy of current methods of measurement of model quality. The paper makes three claims: A) That different language use situation types have different characteristics, B) That language understanding is a multifaceted phenomenon, bringing together individualistic and social processes, and C) That the choice of Understanding Indicator marks the limits of benchmarking, and the beginnings of considerations of the ethics of NLP use.
翻译:自然语言处理领域尽管并非彻底的经验主义,但始终以经验主义为导向而自豪。然而,近来该领域似乎陷入了关于意义与测量问题的本质主义争论(“大语言模型是否理解语言?如果理解,理解程度如何?”)。这并非偶然:如同其他领域,证据的不足是导致理解存在争议的根本原因。对此,本文勾勒了一个理解模型的框架,该模型能够为当前模型质量测量方法的充分性提供判断依据。本文提出三个论点:A) 不同类型的语言使用情境具有不同特征;B) 语言理解是一个多维度现象,融合了个体与社会的双重过程;C) 理解指标的选择标志着基准测试的边界,并开启了对自然语言处理应用伦理的考量。