Clinical trials are vital in advancing drug development and evidence-based medicine, but their success is often hindered by challenges in patient recruitment. In this work, we investigate the potential of large language models (LLMs) to assist individual patients and referral physicians in identifying suitable clinical trials from an extensive selection. Specifically, we introduce TrialGPT, a novel architecture employing LLMs to predict criterion-level eligibility with detailed explanations, which are then aggregated for ranking and excluding candidate clinical trials based on free-text patient notes. We evaluate TrialGPT on three publicly available cohorts of 184 patients and 18,238 annotated clinical trials. The experimental results demonstrate several key findings: First, TrialGPT achieves high criterion-level prediction accuracy with faithful explanations. Second, the aggregated trial-level TrialGPT scores are highly correlated with expert eligibility annotations. Third, these scores prove effective in ranking clinical trials and exclude ineligible candidates. Our error analysis suggests that current LLMs still make some mistakes due to limited medical knowledge and domain-specific context understanding. Nonetheless, we believe the explanatory capabilities of LLMs are highly valuable. Future research is warranted on how such AI assistants can be integrated into the routine trial matching workflow in real-world settings to improve its efficiency.
翻译:临床试验在推进药物研发和循证医学中至关重要,但其成功常受限于患者招募挑战。本研究探索大语言模型(LLMs)在辅助个体患者及转诊医生从广泛临床试验库中识别合适试验的潜力。具体而言,我们提出TrialGPT这一新型架构,利用LLMs基于自由文本的患者病历预测标准级匹配度并生成详细解释,进而通过聚合这些结果对候选临床试验进行排序与排除。我们在包含184名患者及18,238项标注临床试验的三个公开队列上评估TrialGPT。实验结果表明:首先,TrialGPT在标准级预测上达到高准确率并附带可信解释;其次,聚合后的试验级TrialGPT评分与专家资格标注高度相关;第三,这些评分在临床试验排序及排除不合格候选者方面表现有效。错误分析显示,当前LLMs因医学知识及领域语境理解有限仍会存在部分失误。尽管如此,我们认为LLMs的解释能力极具价值。未来需进一步研究如何将此类AI辅助工具整合至真实场景的常规试验匹配流程中,以提升其效率。