Citation Text Generation (CTG) is a task in natural language processing (NLP) that aims to produce text that accurately cites or references a cited document within a source document. In CTG, the generated text draws upon contextual cues from both the source document and the cited paper, ensuring accurate and relevant citation information is provided. Previous work in the field of citation generation is mainly based on the text summarization of documents. Following this, this paper presents a framework, and a comparative study to demonstrate the use of Large Language Models (LLMs) for the task of citation generation. Also, we have shown the improvement in the results of citation generation by incorporating the knowledge graph relations of the papers in the prompt for the LLM to better learn the relationship between the papers. To assess how well our model is performing, we have used a subset of standard S2ORC dataset, which only consists of computer science academic research papers in the English Language. Vicuna performs best for this task with 14.15 Meteor, 12.88 Rouge-1, 1.52 Rouge-2, and 10.94 Rouge-L. Also, Alpaca performs best, and improves the performance by 36.98% in Rouge-1, and 33.14% in Meteor by including knowledge graphs.
翻译:引文文本生成(Citation Text Generation, CTG)是自然语言处理(NLP)中的一项任务,旨在生成能够在源文档中准确引用或提及被引文献的文本。在CTG中,生成的文本利用源文档和被引论文中的上下文线索,确保提供准确且相关的引用信息。以往在引文生成领域的研究主要基于文档的文本摘要。在此基础上,本文提出了一个框架及对比研究,以展示大语言模型(LLMs)在引文生成任务中的应用。此外,我们通过将论文的知识图谱关系纳入提示(prompt)中,使LLM更好地学习论文之间的关联,从而展示了引文生成结果的改进。为了评估我们模型的性能,我们使用了标准S2ORC数据集的一个子集,该子集仅包含英语计算机科学学术研究论文。Vicuna在此任务中表现最佳,取得了14.15的Meteor值、12.88的Rouge-1值、1.52的Rouge-2值和10.94的Rouge-L值。此外,通过引入知识图谱,Alpaca表现最佳,并在Rouge-1上提升36.98%,在Meteor上提升33.14%。