Results from Randomized Controlled Trials (RCTs) establish the comparative effectiveness of interventions, and are in turn critical inputs for evidence-based care. However, results from RCTs are presented in (often unstructured) natural language articles describing the design, execution, and outcomes of trials; clinicians must manually extract findings pertaining to interventions and outcomes of interest from such articles. This onerous manual process has motivated work on (semi-)automating extraction of structured evidence from trial reports. In this work we propose and evaluate a text-to-text model built on instruction-tuned Large Language Models (LLMs) to jointly extract Interventions, Outcomes, and Comparators (ICO elements) from clinical abstracts, and infer the associated results reported. Manual (expert) and automated evaluations indicate that framing evidence extraction as a conditional generation task and fine-tuning LLMs for this purpose realizes considerable ($\sim$20 point absolute F1 score) gains over the previous SOTA. We perform ablations and error analyses to assess aspects that contribute to model performance, and to highlight potential directions for further improvements. We apply our model to a collection of published RCTs through mid-2022, and release a searchable database of structured findings: bit.ly/joint-relations-extraction-mlhc
翻译:随机对照试验(RCT)的结果确立了不同干预措施间的相对有效性,从而成为循证医学的关键依据。然而,RCT结果通常以(往往非结构化的)自然语言文章形式呈现,描述试验的设计、实施过程及结局指标;临床医生必须手动从这类文章中提取与特定干预措施和结局指标相关的研究发现。这一繁重的人工流程催生了从试验报告中(半)自动化提取结构化证据的相关研究。本研究提出并评估了一种基于指令微调大语言模型(LLMs)的文本到文本模型,该模型可从临床摘要中联合提取干预措施、结局指标和比较组(ICO要素),并推断所报告的相关结果。人工(专家)评估与自动化评估表明,将证据提取构建为条件生成任务并针对该任务微调大语言模型,相较于现有最优方法(SOTA)实现了显著提升(F1分数绝对值约提升20分)。我们通过消融实验与误差分析,评估了影响模型性能的关键因素,并指出了进一步优化的潜在方向。我们将该模型应用于截至2022年中已发表的一系列RCT文献,并发布了一个可检索的结构化发现数据库:bit.ly/joint-relations-extraction-mlhc