Despite the significant progress made by transformer models in machine reading comprehension tasks, they still face limitations in handling complex reasoning tasks due to the absence of explicit knowledge in the input sequence. This paper proposes a novel attention pattern to overcome this limitation, which integrates reasoning knowledge derived from a heterogeneous graph into the transformer architecture using a graph-enhanced self-attention mechanism. The proposed attention pattern comprises three key elements: global-local attention for word tokens, graph attention for entity tokens that exhibit strong attention towards tokens connected in the graph as opposed to those unconnected, and the consideration of the type of relationship between each entity token and word token. This results in optimized attention between the two if a relationship exists. The pattern is coupled with special relative position labels, allowing it to integrate with LUKE's entity-aware self-attention mechanism. The experimental findings corroborate that our model outperforms both the cutting-edge LUKE-Graph and the baseline LUKE model on the ReCoRD dataset that focuses on commonsense reasoning.
翻译:尽管Transformer模型在机器阅读理解任务上取得了显著进展,但由于输入序列中缺乏显式知识,其在处理复杂推理任务时仍面临局限。本文提出一种新型注意力模式以克服此局限,该模式通过图增强自注意力机制,将源自异构图的知识整合至Transformer架构中。所提出的注意力模式包含三个关键要素:针对词元(token)的全局-局部注意力、针对实体词元的图注意力(该注意力对图中相连的词元展现出比未相连词元更强的关注倾向),以及考虑各实体词元与词元间的关系类型。若存在关系,此机制可优化两者间的注意力分配。该模式与特殊相对位置标签相结合,从而能与LUKE的实体感知自注意力机制集成。实验结果表明,在聚焦常识推理的ReCoRD数据集上,我们的模型性能优于最先进的LUKE-Graph模型及基准LUKE模型。