Models need appropriate inductive biases to effectively learn from small amounts of data and generalize systematically outside of the training distribution. While Transformers are highly versatile and powerful, they can still benefit from enhanced structural inductive biases for seq2seq tasks, especially those involving syntactic transformations, such as converting active to passive voice or semantic parsing. In this paper, we propose to strengthen the structural inductive bias of a Transformer by intermediate pre-training to perform synthetically generated syntactic transformations of dependency trees given a description of the transformation. Our experiments confirm that this helps with few-shot learning of syntactic tasks such as chunking, and also improves structural generalization for semantic parsing. Our analysis shows that the intermediate pre-training leads to attention heads that keep track of which syntactic transformation needs to be applied to which token, and that the model can leverage these attention heads on downstream tasks.
翻译:模型需要适当的归纳偏置才能有效地从少量数据中学习,并在训练分布之外进行系统性泛化。尽管Transformer模型具有高度通用性和强大性能,但在序列到序列任务中,尤其是涉及句法变换的任务(如主动语态转被动语态或语义解析),它们仍可从增强的结构归纳偏置中获益。本文提出通过对依存树进行合成生成的句法变换(在给定变换描述的条件下)进行中间预训练,以增强Transformer的结构归纳偏置。实验证实,该方法有助于句法任务(如组块分析)的小样本学习,并能提升语义解析的结构泛化能力。分析表明,中间预训练促使注意力头能够追踪特定词元需应用的句法变换类型,且模型可在下游任务中有效利用这些注意力头。