Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (iii) investigate syntax or semantics and overlook other capabilities that lie at the heart of human language, like morphology. Here, we close these gaps by conducting the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages (specifically, English, German, Tamil, and Turkish). We apply a version of Berko's (1958) wug test to ChatGPT, using novel, uncontaminated datasets for the four examined languages. We find that ChatGPT massively underperforms purpose-built systems, particularly in English. Overall, our results -- through the lens of morphology -- cast a new light on the linguistic capabilities of ChatGPT, suggesting that claims of human-like language skills are premature and misleading.
翻译:大型语言模型(LLMs)近期展现出令人瞩目的语言能力,引发了其与人类语言技能的比较。然而,针对最新一代LLMs语言能力的系统性研究相对匮乏,现有研究存在以下问题:(i)忽视了人类卓越的泛化能力;(ii)仅聚焦英语;(iii)主要考察句法或语义,却忽略了形态学这一人类语言的核心能力。为填补这些空白,本文首次对ChatGPT在四种类型各异的语言(英语、德语、泰米尔语和土耳其语)中的形态学能力进行了严谨分析。我们采用改编自Berko(1958)的“wug测试”范式,使用针对四种语言设计的新型无污染数据集展开实验。研究发现,ChatGPT在形态学任务上的表现显著逊色于专用系统,尤其在英语中表现尤甚。总体而言,我们的研究结果——通过形态学视角——为ChatGPT的语言能力提供了新的认识,表明其与人类语言技能相提并论的论断尚为时过早且具有误导性。