Clinical trials are vital in advancing drug development and evidence-based medicine, but their success is often hindered by challenges in patient recruitment. In this work, we investigate the potential of large language models (LLMs) to assist individual patients and referral physicians in identifying suitable clinical trials from an extensive selection. Specifically, we introduce TrialGPT, a novel architecture employing LLMs to predict criterion-level eligibility with detailed explanations, which are then aggregated for ranking and excluding candidate clinical trials based on free-text patient notes. We evaluate TrialGPT on three publicly available cohorts of 184 patients and 18,238 annotated clinical trials. The experimental results demonstrate several key findings: First, TrialGPT achieves high criterion-level prediction accuracy with faithful explanations. Second, the aggregated trial-level TrialGPT scores are highly correlated with expert eligibility annotations. Third, these scores prove effective in ranking clinical trials and exclude ineligible candidates. Our error analysis suggests that current LLMs still make some mistakes due to limited medical knowledge and domain-specific context understanding. Nonetheless, we believe the explanatory capabilities of LLMs are highly valuable. Future research is warranted on how such AI assistants can be integrated into the routine trial matching workflow in real-world settings to improve its efficiency.
翻译:临床试验在推进药物研发和循证医学中至关重要,但其成功常因患者招募困难而受阻。本研究探讨了大型语言模型(LLMs)帮助个体患者及转诊医生从大量候选试验中识别合适临床试验的潜力。具体而言,我们提出了一种新型架构TrialGPT,利用LLMs预测标准级入组资格并生成详细解释,随后基于自由文本患者记录对候选临床试验进行排序和排除。我们在三个包含184名患者和18,238项标注临床试验的公开队列上评估了TrialGPT。实验结果揭示了若干关键发现:首先,TrialGPT实现了高精度的标准级预测,并附带可信的解释;其次,聚合后的试验级TrialGPT评分与专家入组资格标注高度相关;第三,这些评分在临床试验排序和排除不合资格候选试验方面表现有效。我们的错误分析表明,当前LLMs因医学知识局限和领域特定语境理解不足仍会出现一些错误。尽管如此,我们认为LLMs的可解释能力极具价值。未来需要进一步研究如何将此类AI辅助工具集成至实际临床工作流中的常规试验匹配流程,以提升其效率。