Despite successes across a broad range of applications, sequence-to-sequence models' construct of solutions are argued to be less compositional than human-like generalization. There is mounting evidence that one of the reasons hindering compositional generalization is representations of the encoder and decoder uppermost layer are entangled. In other words, the syntactic and semantic representations of sequences are twisted inappropriately. However, most previous studies mainly concentrate on enhancing token-level semantic information to alleviate the representations entanglement problem, rather than composing and using the syntactic and semantic representations of sequences appropriately as humans do. In addition, we explain why the entanglement problem exists from the perspective of recent studies about training deeper Transformer, mainly owing to the ``shallow'' residual connections and its simple, one-step operations, which fails to fuse previous layers' information effectively. Starting from this finding and inspired by humans' strategies, we propose \textsc{FuSion} (\textbf{Fu}sing \textbf{S}yntactic and Semant\textbf{i}c Representati\textbf{on}s), an extension to sequence-to-sequence models to learn to fuse previous layers' information back into the encoding and decoding process appropriately through introducing a \emph{fuse-attention module} at each encoder and decoder layer. \textsc{FuSion} achieves competitive and even \textbf{state-of-the-art} results on two realistic benchmarks, which empirically demonstrates the effectiveness of our proposal.
翻译:尽管序列到序列模型在广泛的应用中取得了成功,但其解决方案的构建被认为缺乏类似于人类泛化能力的组合性。越来越多的证据表明,阻碍组合泛化的原因之一是编码器和解码器最上层的表示存在纠缠问题。换言之,序列的句法和语义表示被不适当地扭曲交织。然而,以往的研究主要集中在增强词元级别的语义信息以缓解表示纠缠问题,而非像人类那样恰当地组合与运用序列的句法和语义表示。此外,我们从近期关于深层Transformer训练的研究视角解释了纠缠问题的成因,主要在于“浅层”残差连接及其简单的一步操作未能有效融合先前层的信息。基于这一发现并受人类策略启发,我们提出了\textsc{FuSion}(融合句法与语义表示),这是序列到序列模型的一种扩展方法,通过在编码器和解码器的每一层引入\textit{融合注意力模块},学习将先前层的信息恰当融合回编码与解码过程。\textsc{FuSion}在两个现实基准测试中取得了具有竞争力甚至\textbf{最先进}的结果,实证验证了我们方法的有效性。