Both humans and transformer language models are able to learn language without explicit structural supervision. What inductive learning biases make this learning possible? In this study, we examine the effect of different inductive learning biases by predisposing language models with structural biases through pretraining on artificial structured data, and then evaluating by fine-tuning on English. Our experimental setup gives us the ability to actively control the inductive bias of language models. With our experiments, we investigate the comparative success of three types of inductive bias: 1) an inductive bias for recursive, hierarchical processing 2) an inductive bias for unrestricted token-token dependencies that can't be modeled by context-free grammars, and 3) an inductive bias for a Zipfian power-law vocabulary distribution. We show that complex token-token interactions form the best inductive biases, and that this is strongest in the non-context-free case. We also show that a Zipfian vocabulary distribution forms a good inductive bias independently from grammatical structure. Our study leverages the capabilities of transformer models to run controlled language learning experiments that are not possible to run in humans, and surfaces hypotheses about the structures that facilitate language learning in both humans and machines.
翻译:人类和Transformer语言模型均能在缺乏显式结构监督的条件下习得语言。何种归纳性学习偏差使这种学习成为可能?本研究通过先对语言模型进行人工结构化数据的预训练以赋予其结构性偏差,再通过英文微调进行评估,探究了不同归纳性学习偏差的影响。我们的实验框架能够主动调控语言模型的归纳性偏差。通过实验,我们比较了三类归纳性偏差的相对成效:1)递归层次化处理的归纳性偏差,2)无法被上下文无关文法建模的无限制词元-词元依赖的归纳性偏差,以及3)齐夫幂律词汇分布的归纳性偏差。研究表明,复杂的词元间交互构成了最优归纳性偏差,且在非上下文无关情况下强度最大;同时,齐夫幂律词汇分布能独立于语法结构形成良好的归纳性偏差。本研究借助Transformer模型的能力,开展了人类无法实现的受控语言学习实验,并提出了关于促进人类与机器语言习得的结构性假说。