Pretrained Language Models (PLMs) benefit from external knowledge stored in graph structures for various downstream tasks. However, bridging the modality gap between graph structures and text remains a significant challenge. Traditional methods like linearizing graphs for PLMs lose vital graph connectivity, whereas Graph Neural Networks (GNNs) require cumbersome processes for integration into PLMs. In this work, we propose a novel graph-guided self-attention mechanism, GraSAME. GraSAME seamlessly incorporates token-level structural information into PLMs without necessitating additional alignment or concatenation efforts. As an end-to-end, lightweight multimodal module, GraSAME follows a multi-task learning strategy and effectively bridges the gap between graph and textual modalities, facilitating dynamic interactions between GNNs and PLMs. Our experiments on the graph-to-text generation task demonstrate that GraSAME outperforms baseline models and achieves results comparable to state-of-the-art (SOTA) models on WebNLG datasets. Furthermore, compared to SOTA models, GraSAME eliminates the need for extra pre-training tasks to adjust graph inputs and reduces the number of trainable parameters by over 100 million.
翻译:预训练语言模型(PLMs)受益于以图结构形式存储的外部知识,可应用于多种下游任务。然而,弥合图结构与文本之间的模态差异仍是一大挑战。线性化图结构等传统方法会损失关键图连接信息,而图神经网络(GNNs)集成至PLMs需要繁琐的预处理流程。本文提出一种新颖的图引导自注意力机制GraSAME。该机制无需额外对齐或拼接操作,即可将词级结构信息无缝融入PLMs。作为一个轻量级端到端多模态模块,GraSAME采用多任务学习策略,有效弥合图模态与文本模态间的鸿沟,促进GNNs与PLMs的动态交互。在图到文本生成任务的实验中,GraSAME在WebNLG数据集上超越基线模型,并取得与最先进模型(SOTA)相当的性能。相较于SOTA模型,GraSAME无需额外的预训练任务调整图输入,且可减少超过1亿个可训练参数。