Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures without explicitly encoding any structural bias. In this work, we investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge. We extensively experiment with transformer models trained on multiple synthetic datasets and with different training objectives and show that while other objectives e.g. sequence-to-sequence modeling, prefix language modeling, often failed to lead to hierarchical generalization, models trained with the language modeling objective consistently learned to generalize hierarchically. We then conduct pruning experiments to study how transformers trained with the language modeling objective encode hierarchical structure. When pruned, we find joint existence of subnetworks within the model with different generalization behaviors (subnetworks corresponding to hierarchical structure and linear order). Finally, we take a Bayesian perspective to further uncover transformers' preference for hierarchical generalization: We establish a correlation between whether transformers generalize hierarchically on a dataset and whether the simplest explanation of that dataset is provided by a hierarchical grammar compared to regular grammars exhibiting linear generalization.
翻译:在自然语言数据上训练的Transformer已被证明能够学习其层级结构,并在未显式编码任何结构偏置的情况下,泛化至包含未见句法结构的句子。本研究探讨了Transformer模型及其训练过程中可能导致此类泛化行为出现的归纳偏置来源。我们通过大规模实验,在多种合成数据集上训练Transformer模型并采用不同训练目标,结果表明:尽管其他目标(如序列到序列建模、前缀语言建模)通常无法实现层级泛化,但采用语言建模目标训练的模型始终能学会层级泛化。随后,我们通过剪枝实验研究采用语言建模目标训练的Transformer如何编码层级结构。剪枝后,我们发现模型内部存在具有不同泛化行为的子网络(对应于层级结构与线性顺序的子网络共存)。最后,我们从贝叶斯视角进一步揭示Transformer对层级泛化的偏好:我们建立了Transformer在数据集上是否实现层级泛化与该数据集的最简解释由层级语法(而非展示线性泛化的正则语法)提供之间的相关性。