Both humans and large language models are able to learn language without explicit structural supervision. What inductive biases make this learning possible? We address this fundamental cognitive question by leveraging transformer language models: we inject inductive bias into language models by pretraining on formally-structured data, and then evaluate the biased learners' ability to learn typologically-diverse natural languages. Our experimental setup creates a testbed for hypotheses about inductive bias in human language learning. We investigate the effect of injecting models with three types of inductive bias: 1) recursive, hierarchical processing, 2) crossing token-token relationships that can't be modeled by context-free grammars, and 3) a Zipfian power-law vocabulary distribution. We show that non-context-free relationships form the best inductive biases. Our study leverages the capabilities of transformer models to run controlled language learning experiments that are not possible to run on humans, and surfaces hypotheses about the structures that facilitate language learning in both humans and machines.
翻译:人类与大型语言模型均能在缺乏显式结构监督的情况下习得语言。何种归纳偏差使这种学习成为可能?我们通过利用Transformer语言模型来探讨这一基础认知问题:通过让模型在形式化结构数据上进行预训练来注入归纳偏差,随后评估这些带有偏差的学习者习得类型多样的自然语言的能力。我们的实验设置为关于人类语言学习中归纳偏差的假设创建了测试平台。我们研究了向模型注入三种归纳偏差的影响:1) 递归性层级处理,2) 无法用上下文无关文法建模的跨token关系,以及3) 齐普夫幂律词汇分布。研究结果表明,非上下文无关关系形成了最优的归纳偏差。本研究利用了Transformer模型的能力,开展了无法在人类身上进行的受控语言学习实验,并提出了关于促进人类与机器语言学习的结构的假设。