System for systematic literature review using multiple AI agents: Concept and an empirical evaluation

Systematic Literature Reviews (SLRs) have become the foundation of evidence-based studies, enabling researchers to identify, classify, and combine existing studies based on specific research questions. Conducting an SLR is largely a manual process. Over the previous years, researchers have made significant progress in automating certain phases of the SLR process, aiming to reduce the effort and time needed to carry out high-quality SLRs. However, there is still a lack of AI agent-based models that automate the entire SLR process. To this end, we introduce a novel multi-AI agent model designed to fully automate the process of conducting an SLR. By utilizing the capabilities of Large Language Models (LLMs), our proposed model streamlines the review process, enhancing efficiency and accuracy. The model operates through a user-friendly interface where researchers input their topic, and in response, the model generates a search string used to retrieve relevant academic papers. Subsequently, an inclusive and exclusive filtering process is applied, focusing on titles relevant to the specific research area. The model then autonomously summarizes the abstracts of these papers, retaining only those directly related to the field of study. In the final phase, the model conducts a thorough analysis of the selected papers concerning predefined research questions. We also evaluated the proposed model by sharing it with ten competent software engineering researchers for testing and analysis. The researchers expressed strong satisfaction with the proposed model and provided feedback for further improvement. The code for this project can be found on the GitHub repository at https://github.com/GPT-Laboratory/SLR-automation.

翻译：系统性文献综述（SLR）已成为循证研究的基础，使研究者能够基于特定研究问题识别、分类和整合现有研究。SLR的实施主要依赖人工操作。近年来，研究者们在自动化SLR流程的特定环节方面取得了显著进展，旨在减少开展高质量SLR所需的工作量及时间。然而，目前仍缺乏能够完全自动化SLR全流程的AI智能体模型。为此，我们提出了一种新颖的多AI智能体模型，旨在实现SLR全流程的完全自动化。通过利用大型语言模型（LLM）的能力，该模型简化了综述流程，提升了效率与准确性。模型通过用户友好界面运行，研究者输入研究主题后，模型自动生成用于检索相关学术文献的检索字符串。随后，模型执行包含性与排他性筛选流程，聚焦于特定研究领域相关的标题。接着，模型自主总结这些文献的摘要，仅保留与研究方向直接相关的文献。在最终阶段，模型对所选文献进行针对预设研究问题的深入分析。我们还通过邀请十位资深软件工程研究者进行测试和分析，评估了所提模型。研究者对该模型表示高度满意，并提出了改进建议。本项目的代码可在GitHub仓库（https://github.com/GPT-Laboratory/SLR-automation）获取。