In any system that uses structured knowledge graph (KG) data as its underlying knowledge representation, KG-to-text generation is a useful tool for turning parts of the graph data into text that can be understood by humans. Recent work has shown that models that make use of pretraining on large amounts of text data can perform well on the KG-to-text task even with relatively small sets of training data on the specific graph-to-text task. In this paper, we build on this concept by using large language models to perform zero-shot generation based on nothing but the model's understanding of the triple structure from what it can read. We show that ChatGPT achieves near state-of-the-art performance on some measures of the WebNLG 2020 challenge, but falls behind on others. Additionally, we compare factual, counter-factual and fictional statements, and show that there is a significant connection between what the LLM already knows about the data it is parsing and the quality of the output text.
翻译:在任何使用结构化知识图谱(KG)数据作为底层知识表示的系统中,KG到文本生成是将图谱数据部分转化为人类可理解文本的有用工具。近期研究表明,即使在特定图谱到文本任务上仅有相对较小的训练数据集,利用大规模文本数据预训练的模型也能在此类任务上表现出色。本文基于这一概念,通过使用大型语言模型仅基于其对从可读内容中理解的三元组结构进行零样本生成。我们证明,ChatGPT在WebNLG 2020挑战赛的某些指标上接近当前最优性能,但在其他指标上有所落后。此外,我们比较了事实性、反事实性和虚构性陈述,并发现大型语言模型对其解析数据的先验知识量与输出文本质量之间存在显著关联。