The literature review is an indispensable step in the research process. It provides the benefit of comprehending the research problem and understanding the current research situation while conducting a comparative analysis of prior works. However, literature summary is challenging and time consuming. The previous LLM-based studies on literature review mainly focused on the complete process, including literature retrieval, screening, and summarization. However, for the summarization step, simple CoT method often lacks the ability to provide extensive comparative summary. In this work, we firstly focus on the independent literature summarization step and introduce ChatCite, an LLM agent with human workflow guidance for comparative literature summary. This agent, by mimicking the human workflow, first extracts key elements from relevant literature and then generates summaries using a Reflective Incremental Mechanism. In order to better evaluate the quality of the generated summaries, we devised a LLM-based automatic evaluation metric, G-Score, in refer to the human evaluation criteria. The ChatCite agent outperformed other models in various dimensions in the experiments. The literature summaries generated by ChatCite can also be directly used for drafting literature reviews.
翻译:文献综述是研究过程中不可或缺的步骤。它有助于理解研究问题、把握当前研究现状,同时对先前工作进行对比分析。然而,文献综述本身具有挑战性且耗时。此前基于LLM的文献综述研究主要关注完整流程,包括文献检索、筛选和总结。但在总结步骤中,简单的CoT方法往往缺乏提供全面比较性总结的能力。本研究首次聚焦独立的文献总结步骤,提出ChatCite——一个基于人类工作流引导的LLM代理,用于比较性文献综述。该代理通过模拟人类工作流程,首先从相关文献中提取关键要素,随后采用反思性增量机制生成总结。为更好地评估生成总结的质量,我们参照人类评估标准设计了基于LLM的自动评估指标G-Score。实验表明,ChatCite代理在多个维度上均优于其他模型。由ChatCite生成的文献总结可直接用于起草文献综述。