In this paper, we propose a novel architecture called Composition Attention Grammars (CAGs) that recursively compose subtrees into a single vector representation with a composition function, and selectively attend to previous structural information with a self-attention mechanism. We investigate whether these components -- the composition function and the self-attention mechanism -- can both induce human-like syntactic generalization. Specifically, we train language models (LMs) with and without these two components with the model sizes carefully controlled, and evaluate their syntactic generalization performance against six test circuits on the SyntaxGym benchmark. The results demonstrated that the composition function and the self-attention mechanism both play an important role to make LMs more human-like, and closer inspection of linguistic phenomenon implied that the composition function allowed syntactic features, but not semantic features, to percolate into subtree representations.
翻译:本文提出一种名为组合注意力文法(CAGs)的新型架构,该架构通过组合函数递归地将子树聚合为单一向量表示,并利用自注意力机制选择性关注先前的结构信息。我们探究了这两种组件——组合函数与自注意力机制——是否均能诱发类人句法泛化能力。具体而言,我们在严格控制模型规模的条件下,训练了包含/不包含这两种组件的语言模型,并在SyntaxGym基准测试中针对六个测试回路评估其句法泛化性能。结果表明,组合函数与自注意力机制对提升语言模型类人度均具有重要作用;对语言现象的深入分析进一步揭示,组合函数能使句法特征(而非语义特征)渗透至子树表征中。