Human language is known to exhibit a nested, hierarchical structure, allowing us to form complex sentences out of smaller pieces. However, many state-of-the-art neural networks models such as Transformers have no explicit hierarchical structure in its architecture -- that is, they don't have an inductive bias toward hierarchical structure. Additionally, Transformers are known to perform poorly on compositional generalization tasks which require such structures. In this paper, we introduce Treeformer, a general-purpose encoder module inspired by the CKY algorithm which learns a composition operator and pooling function to construct hierarchical encodings for phrases and sentences. Our extensive experiments demonstrate the benefits of incorporating hierarchical structure into the Transformer and show significant improvements in compositional generalization as well as in downstream tasks such as machine translation, abstractive summarization, and various natural language understanding tasks.
翻译:人类语言已知具有嵌套的层次化结构,使我们能够用较小的片段构成复杂句子。然而,许多最先进的神经网络模型(如Transformer)在其架构中并未显式引入层次结构——即它们缺乏对层次化结构的归纳偏置。此外,Transformer在需要此类结构的组合泛化任务中表现欠佳。本文提出Treeformer——一种受CKY算法启发的通用编码器模块,通过学习组合算子与池化函数,为短语和句子构建层次化编码。大量实验证明,将层次结构融入Transformer可显著提升组合泛化能力,并在机器翻译、抽象式摘要及多项自然语言理解任务中展现出优化效果。