Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (iii) investigate syntax or semantics and overlook other capabilities that lie at the heart of human language, like morphology. Here, we close these gaps by conducting the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages (specifically, English, German, Tamil, and Turkish). We apply a version of Berko's (1958) wug test to ChatGPT, using novel, uncontaminated datasets for the four examined languages. We find that ChatGPT massively underperforms purpose-built systems, particularly in English. Overall, our results -- through the lens of morphology -- cast a new light on the linguistic capabilities of ChatGPT, suggesting that claims of human-like language skills are premature and misleading.
翻译:大型语言模型(LLMs)近期展现出令人瞩目的语言能力,促使研究者将其与人类语言技能进行比较。然而,针对最新一代LLMs语言能力的系统性研究相对较少,且现有研究(i)忽视了人类卓越的泛化能力,(ii)仅关注英语,以及(iii)侧重于句法或语义层面,忽略了形态学等人类语言核心能力。为填补这些空白,本研究首次严格分析了ChatGPT在四种类型学差异显著的语言(英语、德语、泰米尔语和土耳其语)中的形态学能力。我们采用改编版Berko(1958)的"wug测试",使用针对这四种语言构建的全新无污染数据集。研究发现,ChatGPT的表现远逊于专用系统,尤其在英语中差距显著。总体而言,我们的结果——通过形态学视角——揭示了ChatGPT语言能力的新图景,表明其拟人化语言技能的说法为时过早且具有误导性。