Large language models (LLMs) have been widely employed for graph-to-text generation tasks. However, the process of finetuning LLMs requires significant training resources and annotation work. In this paper, we explore the capability of generative models to generate descriptive text from graph data in a zero-shot setting. Specifically, we evaluate GPT-3 and ChatGPT on two graph-to-text datasets and compare their performance with that of finetuned LLM models such as T5 and BART. Our results demonstrate that generative models are capable of generating fluent and coherent text, achieving BLEU scores of 10.57 and 11.08 for the AGENDA and WebNLG datasets, respectively. However, our error analysis reveals that generative models still struggle with understanding the semantic relations between entities, and they also tend to generate text with hallucinations or irrelevant information. As a part of error analysis, we utilize BERT to detect machine-generated text and achieve high macro-F1 scores. We have made the text generated by generative models publicly available.
翻译:大语言模型(LLMs)已被广泛应用于图到文本生成任务。然而,微调LLMs需要大量训练资源和标注工作。本文探索了生成式模型在零样本设置下从图数据生成描述性文本的能力。具体而言,我们在两个图到文本数据集上评估了GPT-3和ChatGPT,并将其性能与微调后的LLMs(如T5和BART)进行了比较。结果表明,生成式模型能够生成流畅连贯的文本,在AGENDA和WebNLG数据集上分别达到10.57和11.08的BLEU分数。然而,我们的误差分析表明,生成式模型在理解实体间的语义关系方面仍存在困难,并且倾向于生成包含幻觉或不相关信息的文本。作为误差分析的一部分,我们利用BERT检测机器生成的文本,并获得了较高的宏F1分数。我们已将生成式模型生成的文本公开发布。