The rapid growth of scientific publications has made it increasingly difficult to keep literature reviews comprehensive and up-to-date. Though prior work has focused on automating retrieval and screening, the writing phase of systematic reviews remains largely under-explored, especially with regard to readability and factual accuracy. To address this, we present LiRA (Literature Review Agents), a multi-agent collaborative workflow which emulates the human literature review process. LiRA utilizes specialized agents for content outlining, subsection writing, editing, and reviewing, producing cohesive and comprehensive review articles. Evaluated on SciReviewGen and a proprietary ScienceDirect dataset, LiRA outperforms current baselines such as AutoSurvey and MASS-Survey in writing and citation quality, while maintaining competitive similarity to human-written reviews. We further evaluate LiRA in real-world scenarios using document retrieval and assess its robustness to reviewer model variation. Our findings highlight the potential of agentic LLM workflows, even without domain-specific tuning, to improve the reliability and usability of automated scientific writing.
翻译:摘要:科学出版物数量的快速增长使得文献综述难以保持全面性和时效性。尽管已有研究专注于自动化检索和筛选,但系统综述的撰写阶段仍未被充分探索,尤其是在可读性和事实准确性方面。为解决这一问题,我们提出LiRA(文献综述智能体),一种模拟人类文献综述过程的多智能体协作工作流。LiRA利用专门智能体进行内容大纲生成、子章节撰写、编辑和审校,从而生成连贯且全面的综述文章。在SciReviewGen和专有ScienceDirect数据集上的评估表明,LiRA在写作质量和引文质量上优于当前基线方法(如AutoSurvey和MASS-Survey),同时在与人类撰写的综述的相似性方面保持竞争力。我们进一步通过文档检索在实际场景中评估LiRA,并检验其对审校模型变化的鲁棒性。研究结果表明,即使缺乏领域特定微调,基于智能体的大语言模型工作流也能提升自动化科学写作的可靠性和可用性。