Results from Randomized Controlled Trials (RCTs) establish the comparative effectiveness of interventions, and are in turn critical inputs for evidence-based care. However, results from RCTs are presented in (often unstructured) natural language articles describing the design, execution, and outcomes of trials; clinicians must manually extract findings pertaining to interventions and outcomes of interest from such articles. This onerous manual process has motivated work on (semi-)automating extraction of structured evidence from trial reports. In this work we propose and evaluate a text-to-text model built on instruction-tuned Large Language Models (LLMs) to jointly extract Interventions, Outcomes, and Comparators (ICO elements) from clinical abstracts, and infer the associated results reported. Manual (expert) and automated evaluations indicate that framing evidence extraction as a conditional generation task and fine-tuning LLMs for this purpose realizes considerable ($\sim$20 point absolute F1 score) gains over the previous SOTA. We perform ablations and error analyses to assess aspects that contribute to model performance, and to highlight potential directions for further improvements. We apply our model to a collection of published RCTs through mid-2022, and release a searchable database of structured findings (anonymously for now): bit.ly/joint-relations-extraction-mlhc
翻译:随机对照试验(RCT)的结果确立了干预措施的相对有效性,是循证医学决策的关键依据。然而,RCT结果通常以(往往非结构化的)自然语言文章呈现,描述试验的设计、实施及结局;临床医生必须从这类文章中手动提取与关注干预措施和结局相关的研究结果。这一繁琐的手动过程推动了从试验报告中(半)自动化提取结构化证据的研究。本研究提出并评估了一种基于指令微调大语言模型(LLM)的文本到文本模型,旨在从临床摘要中联合提取干预措施、结局及比较因素(ICO元素),并推断相关报告结果。人工(专家)评估与自动化评估表明,将证据提取视为条件生成任务并针对该目标微调LLM,相较于先前最优方法实现了显著提升(约20个绝对F1分数点)。我们通过消融实验和错误分析评估影响模型性能的要素,并指出进一步改进的潜在方向。该模型应用于截至2022年中期已发表的随机对照试验数据集,并发布了一个可检索的结构化数据库(目前为匿名形式):bit.ly/joint-relations-extraction-mlhc