Chomsky and others have very directly claimed that large language models (LLMs) are equally capable of learning languages that are possible and impossible for humans to learn. However, there is very little published experimental evidence to support such a claim. Here, we develop a set of synthetic impossible languages of differing complexity, each designed by systematically altering English data with unnatural word orders and grammar rules. These languages lie on an impossibility continuum: at one end are languages that are inherently impossible, such as random and irreversible shuffles of English words, and on the other, languages that may not be intuitively impossible but are often considered so in linguistics, particularly those with rules based on counting word positions. We report on a wide range of evaluations to assess the capacity of GPT-2 small models to learn these uncontroversially impossible languages, and crucially, we perform these assessments at various stages throughout training to compare the learning process for each language. Our core finding is that GPT-2 struggles to learn impossible languages when compared to English as a control, challenging the core claim. More importantly, we hope our approach opens up a productive line of inquiry in which different LLM architectures are tested on a variety of impossible languages in an effort to learn more about how LLMs can be used as tools for these cognitive and typological investigations.
翻译:乔姆斯基等人曾直接声称,大语言模型(LLM)在学习人类可习得与不可习得的语言方面具有同等能力。然而,支撑这一论断的已发表实验证据非常有限。本文开发了一组复杂度各异的人造不可习得语言,通过系统性地改变英语数据,引入非自然的词序和语法规则来设计这些语言。这些语言位于一个"不可能性连续统"上:一端是本质上的不可能语言,例如英语词汇的随机乱序和不可逆转排列;另一端则是在语言学中通常被认为不可习得(尽管未必直觉上不可能)的语言,尤其是那些基于词位计数的规则语言。我们通过一系列广泛评估,检验GPT-2小型模型学习这些无争议的不可习得语言的能力,关键在于,我们在训练过程中的不同阶段进行这些评估,以比较每种语言的学习过程。核心发现是:与英语对照相比,GPT-2在习得不可习得语言时表现困难,这直接挑战了上述核心论断。更重要的是,我们希望本研究能开辟一条富有成效的研究路径,通过在不同不可习得语言上测试各类LLM架构,深入探索如何将LLM作为认知与类型学研究的工具。