In any system that uses structured knowledge graph (KG) data as its underlying knowledge representation, KG-to-text generation is a useful tool for turning parts of the graph data into text that can be understood by humans. Recent work has shown that models that make use of pretraining on large amounts of text data can perform well on the KG-to-text task even with relatively small sets of training data on the specific graph-to-text task. In this paper, we build on this concept by using large language models to perform zero-shot generation based on nothing but the model's understanding of the triple structure from what it can read. We show that ChatGPT achieves near state-of-the-art performance on some measures of the WebNLG 2020 challenge, but falls behind on others. Additionally, we compare factual, counter-factual and fictional statements, and show that there is a significant connection between what the LLM already knows about the data it is parsing and the quality of the output text.
翻译:在任何使用结构化知识图谱(KG)数据作为底层知识表示的系统里,KG到文本生成是一项能将图数据的部分内容转化为人类可理解文本的有用工具。近期研究表明,即使在图到文本的特定任务上仅有相对较小的训练数据集,利用大量文本数据进行预训练的模型也能在KG到文本任务上展现出良好性能。本文基于这一概念,利用大型语言模型仅凭其对从自身读取内容中理解的三元组结构,执行零样本生成。我们展示了ChatGPT在WebNLG 2020挑战赛的部分指标上接近了当前最优性能,但在其他指标上则表现不足。此外,我们对比了事实性、反事实性与虚构性陈述,并揭示了大型语言模型对其所解析数据的已知程度与输出文本质量之间存在显著关联。