The generation of explanation graphs is a significant task that aims to produce explanation graphs in response to user input, revealing the internal reasoning process. This task is challenging due to the significant discrepancy between unstructured user queries and structured explanation graphs. Current research commonly fine-tunes a text-based pre-trained language model on a small downstream dataset that is annotated with labeled graphs. However, due to the limited scale of available datasets, this approach may prove to be insufficient in bridging the gap between natural language text and structured graphs. In this paper, to alleviate the above limitations, we propose a novel pre-trained framework EG3P(for Explanation Graph Generation via Generative Pre-training over synthetic graphs) for the explanation graph generation task. Specifically, we first propose a text-to-graph generative task to pre-train the model with the goal of bridging the text-graph gap. Additionally, we propose an automatic corpus synthesis strategy for synthesizing a large scale of high-quality corpus, reducing the reliance on costly manual annotation methods. Experimental results on ExplaGraphs show the effectiveness of EG3P that our model surpasses all baseline systems with remarkable margins. Besides, further analysis demonstrates that EG3P is able to generate better explanation graphs on actual reasoning tasks such as CommonsenseQA and OpenbookQA.
翻译:解释图生成是一项重要任务,旨在根据用户输入生成解释图,揭示内部推理过程。该任务具有挑战性,因为非结构化用户查询与结构化解释图之间存在显著差异。当前研究通常在一个标注有图标签的小型下游数据集上微调基于文本的预训练语言模型。然而,由于可用数据集规模有限,这种方法可能不足以弥合自然语言文本与结构化图之间的差距。在本文中,为缓解上述限制,我们提出了一种新颖的预训练框架EG3P(基于合成图上的生成式预训练进行解释图生成),用于解释图生成任务。具体而言,我们首先提出一项文本到图的生成任务来预训练模型,旨在弥合文本-图之间的差距。此外,我们提出了一种自动语料合成策略,用于合成大规模高质量语料,从而减少对昂贵的人工标注方法的依赖。在ExplaGraphs上的实验结果表明了EG3P的有效性:我们的模型以显著优势超越了所有基线系统。此外,进一步分析表明,EG3P能够在实际推理任务(如CommonsenseQA和OpenbookQA)中生成更好的解释图。