Data-to-text (D2T) generation aims to transform structured data into natural language text. Data-to-text pre-training has proved to be powerful in enhancing D2T generation and yields impressive performances. However, previous pre-training methods either oversimplified structured data into a sequence without considering input structures or designed training objectives tailored for a specific data structure (e.g., table or knowledge graph). In this paper, we unify different types of structured data (i.e., table, key-value data, knowledge graph) into the graph format and cast different data-to-text generation tasks as graph-to-text generation. To effectively exploit the structural information of the input graph, we propose a structure-enhanced pre-training method for D2T generation by designing a structure-enhanced Transformer. Concretely, we devise a position matrix for the Transformer, encoding relative positional information of connected nodes in the input graph. In addition, we propose a new attention matrix to incorporate graph structures into the original Transformer by taking the available explicit connectivity structure into account. Extensive experiments on six benchmark datasets show the effectiveness of our model. Our source codes are available at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/unid2t.
翻译:数据到文本生成旨在将结构化数据转换为自然语言文本。数据到文本预训练已被证明能有效增强数据到文本生成能力,并取得显著性能。然而,以往的预训练方法要么将结构化数据过度简化为一维序列而忽视输入结构,要么针对特定数据结构(如表格或知识图谱)设计训练目标。本文中,我们将不同类型的结构化数据(即表格、键值数据、知识图谱)统一为图格式,并将不同的数据到文本生成任务视为图到文本生成。为有效利用输入图的结构信息,我们通过设计结构增强型Transformer,提出一种面向数据到文本生成的结构增强预训练方法。具体而言,我们为Transformer设计了一个位置矩阵,用于编码输入图中连接节点的相对位置信息。此外,我们提出一种新的注意力矩阵,通过考虑显式可用连接结构,将图结构融入原始Transformer。在六个基准数据集上的大量实验表明了我们模型的有效性。我们的源代码已发布于:https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/unid2t。