Bio-SIEVE: Exploring Instruction Tuning Large Language Models for Systematic Review Automation

Medical systematic reviews can be very costly and resource intensive. We explore how Large Language Models (LLMs) can support and be trained to perform literature screening when provided with a detailed set of selection criteria. Specifically, we instruction tune LLaMA and Guanaco models to perform abstract screening for medical systematic reviews. Our best model, Bio-SIEVE, outperforms both ChatGPT and trained traditional approaches, and generalises better across medical domains. However, there remains the challenge of adapting the model to safety-first scenarios. We also explore the impact of multi-task training with Bio-SIEVE-Multi, including tasks such as PICO extraction and exclusion reasoning, but find that it is unable to match single-task Bio-SIEVE's performance. We see Bio-SIEVE as an important step towards specialising LLMs for the biomedical systematic review process and explore its future developmental opportunities. We release our models, code and a list of DOIs to reconstruct our dataset for reproducibility.

翻译：医学系统综述往往成本高昂且资源密集。我们探究了在提供详细筛选标准的情况下，大语言模型（LLMs）如何支持并接受训练以进行文献筛选。具体而言，我们对LLaMA和Guanaco模型进行指令微调，使其能够执行医学系统综述的摘要筛选任务。我们的最佳模型Bio-SIEVE在性能上超越了ChatGPT及经过训练的传统方法，并在跨医学领域展现出更优的泛化能力。然而，将模型适配至安全优先场景仍是一项挑战。我们还探索了多任务训练（使用Bio-SIEVE-Multi）的影响，涵盖PICO提取与排除推理等任务，但发现其性能无法匹敌单任务Bio-SIEVE模型。我们认为Bio-SIEVE是推动大语言模型专业化应用于生物医学系统综述流程的重要一步，并探讨了其未来发展前景。我们开源了模型、代码及用于重建数据集的DOI列表，以确保研究可复现。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日