In this paper, we propose an explanation of representation for self-attention network (SAN) based neural sequence encoders, which regards the information captured by the model and the encoding of the model as graph structure and the generation of these graph structures respectively. The proposed explanation applies to existing works on SAN-based models and can explain the relationship among the ability to capture the structural or linguistic information, depth of model, and length of sentence, and can also be extended to other models such as recurrent neural network based models. We also propose a revisited multigraph called Multi-order-Graph (MoG) based on our explanation to model the graph structures in the SAN-based model as subgraphs in MoG and convert the encoding of SAN-based model to the generation of MoG. Based on our explanation, we further introduce a Graph-Transformer by enhancing the ability to capture multiple subgraphs of different orders and focusing on subgraphs of high orders. Experimental results on multiple neural machine translation tasks show that the Graph-Transformer can yield effective performance improvement.
翻译:本文提出了一种对基于自注意力网络(SAN)的神经序列编码器表示的解释方法,该方法将模型捕获的信息视为图结构,将模型编码视为这些图结构的生成过程。所提出的解释适用于现有基于SAN的模型研究工作,能够阐明模型捕获结构或语言信息的能力、模型深度以及句子长度之间的关系,并且可扩展至其他模型,例如基于循环神经网络的模型。基于该解释,我们进一步提出一种称为多阶图(MoG)的改进型多重图,将SAN模型中的图结构建模为MoG的子图,并将SAN模型的编码过程转化为MoG的生成过程。在此基础上,我们通过增强捕获不同阶次多子图的能力并聚焦高阶子图,引入了一种图变换器(Graph-Transformer)。在多项神经机器翻译任务上的实验结果表明,该图变换器能有效提升模型性能。