Do language models (LMs) offer insights into human language learning? A common argument against this idea is that because their architecture and training paradigm are so vastly different from humans, LMs can learn arbitrary inputs as easily as natural languages. We test this claim by training LMs to model impossible and typologically unattested languages. Unlike previous work, which has focused exclusively on English, we conduct experiments on 12 languages from 4 language families with two newly constructed parallel corpora. Our results show that while GPT-2 small can largely distinguish attested languages from their impossible counterparts, it does not achieve perfect separation between all the attested languages and all the impossible ones. We further test whether GPT-2 small distinguishes typologically attested from unattested languages with different NP orders by manipulating word order based on Greenberg's Universal 20. We find that the model's perplexity scores do not distinguish attested vs. unattested word orders, while its performance on the generalization test does. These findings suggest that LMs exhibit some human-like inductive biases, though these biases are weaker than those found in human learners.
翻译:语言模型(LMs)能否为人类语言学习提供洞见?反对这一观点的一个常见论据是,由于其架构和训练范式与人类差异巨大,语言模型可以像学习自然语言一样轻松学习任意输入。我们通过训练语言模型建模不可能语言及类型学上未见证的语言来检验这一论断。不同于以往仅针对英语的研究,我们利用两个新构建的平行语料库,在4个语系的12种语言上开展实验。结果表明,虽然GPT-2 small基本能区分已见证语言与其不可能变体,但未能实现所有已见证语言与所有不可能语言之间的完美分离。我们进一步通过基于格林伯格普遍性第20条操纵语序,测试GPT-2 small是否能区分具有不同名词短语语序的类型学已见证与未见证语言。结果发现,模型的困惑度分数无法区分已见证与未见证语序,但其泛化测试表现却能实现区分。这些发现表明,语言模型展现出部分类似人类的归纳偏置,但这些偏置弱于人类学习者的偏置。