Systematic reviews are comprehensive reviews of the literature for a highly focused research question. These reviews are often treated as the highest form of evidence in evidence-based medicine, and are the key strategy to answer research questions in the medical field. To create a high-quality systematic review, complex Boolean queries are often constructed to retrieve studies for the review topic. However, it often takes a long time for systematic review researchers to construct a high quality systematic review Boolean query, and often the resulting queries are far from effective. Poor queries may lead to biased or invalid reviews, because they missed to retrieve key evidence, or to extensive increase in review costs, because they retrieved too many irrelevant studies. Recent advances in Transformer-based generative models have shown great potential to effectively follow instructions from users and generate answers based on the instructions being made. In this paper, we investigate the effectiveness of the latest of such models, ChatGPT, in generating effective Boolean queries for systematic review literature search. Through a number of extensive experiments on standard test collections for the task, we find that ChatGPT is capable of generating queries that lead to high search precision, although trading-off this for recall. Overall, our study demonstrates the potential of ChatGPT in generating effective Boolean queries for systematic review literature search. The ability of ChatGPT to follow complex instructions and generate queries with high precision makes it a valuable tool for researchers conducting systematic reviews, particularly for rapid reviews where time is a constraint and often trading-off higher precision for lower recall is acceptable.
翻译:系统综述是针对高度聚焦的研究问题对文献进行的全面综述。这些综述常被视为循证医学中最高级别的证据,是回答医学领域研究问题的关键策略。为创建高质量的系统综述,通常需要构建复杂的布尔查询来检索与综述主题相关的研究。然而,系统综述研究者往往需要花费大量时间构建高质量的系统综述布尔查询,且最终得到的查询效果往往远非理想。低效的查询可能导致综述出现偏倚或无效,因为其遗漏了关键证据;也可能导致综述成本显著增加,因为其检索到过多不相关的研究。基于Transformer的生成模型的最新进展已展现出遵循用户指令并据此生成答案的巨大潜力。本文研究了此类最新模型ChatGPT在生成用于系统综述文献检索的有效布尔查询方面的效能。通过在标准测试集上开展大量实验,我们发现ChatGPT能够生成实现高检索精度的查询,但这是以牺牲召回率为代价的。总体而言,我们的研究证明了ChatGPT在生成用于系统综述文献检索的有效布尔查询方面的潜力。ChatGPT遵循复杂指令并生成高精度查询的能力,使其成为系统综述研究者的宝贵工具,尤其适用于时间受限且通常允许以较高精度换取较低召回率的快速综述场景。