Matching patients to clinical trials is a key unsolved challenge in bringing new drugs to market. Today, identifying patients who meet a trial's eligibility criteria is highly manual, taking up to 1 hour per patient. Automated screening is challenging, however, as it requires understanding unstructured clinical text. Large language models (LLMs) offer a promising solution. In this work, we explore their application to trial matching. First, we design an LLM-based system which, given a patient's medical history as unstructured clinical text, evaluates whether that patient meets a set of inclusion criteria (also specified as free text). Our zero-shot system achieves state-of-the-art scores on the n2c2 2018 cohort selection benchmark. Second, we improve the data and cost efficiency of our method by identifying a prompting strategy which matches patients an order of magnitude faster and more cheaply than the status quo, and develop a two-stage retrieval pipeline that reduces the number of tokens processed by up to a third while retaining high performance. Third, we evaluate the interpretability of our system by having clinicians evaluate the natural language justifications generated by the LLM for each eligibility decision, and show that it can output coherent explanations for 97% of its correct decisions and 75% of its incorrect ones. Our results establish the feasibility of using LLMs to accelerate clinical trial operations.
翻译:将患者与临床试验进行匹配是药物上市过程中尚未解决的关键挑战。目前,识别符合试验入组标准的患者高度依赖人工操作,每位患者平均耗时长达1小时。自动化筛选面临巨大挑战,因其需要理解非结构化的临床文本。大型语言模型(LLMs)为此提供了极具前景的解决方案。本研究探索了其在试验匹配中的应用。首先,我们设计了一个基于LLM的系统:以患者非结构化临床病史为输入,评估其是否符合一组以自由文本形式指定的入组标准。该零样本系统在n2c2 2018队列选择基准上取得了最优评分。其次,我们通过识别一种提示策略,使患者匹配速度与成本均比现有方法提升一个数量级,从而提高了数据与成本效率;同时开发了两阶段检索管线,在保持高性能的前提下将处理令牌数减少多达三分之一。第三,我们评估了系统的可解释性:由临床医生评估LLM针对每项入组决策生成的自然语言解释,结果显示系统可为其正确决策的97%和错误决策的75%输出连贯的解释。研究结果确立了利用LLM加速临床试验运营的可行性。