Clinical trials are indispensable for medical research and the development of new treatments. However, clinical trials often involve thousands of participants and can span several years to complete, with a high probability of failure during the process. Recently, there has been a burgeoning interest in virtual clinical trials, which simulate real-world scenarios and hold the potential to significantly enhance patient safety, expedite development, reduce costs, and contribute to the broader scientific knowledge in healthcare. Existing research often focuses on leveraging electronic health records (EHRs) to support clinical trial outcome prediction. Yet, trained with limited clinical trial outcome data, existing approaches frequently struggle to perform accurate predictions. Some research has attempted to generate EHRs to augment model development but has fallen short in personalizing the generation for individual patient profiles. Recently, the emergence of large language models has illuminated new possibilities, as their embedded comprehensive clinical knowledge has proven beneficial in addressing medical issues. In this paper, we propose a large language model-based digital twin creation approach, called TWIN-GPT. TWIN-GPT can establish cross-dataset associations of medical information given limited data, generating unique personalized digital twins for different patients, thereby preserving individual patient characteristics. Comprehensive experiments show that using digital twins created by TWIN-GPT can boost the clinical trial outcome prediction, exceeding various previous prediction approaches.
翻译:临床试验对于医学研究和新疗法开发不可或缺。然而,临床试验通常涉及数千名参与者,可能耗时数年才能完成,且过程中失败概率较高。近年来,虚拟临床试验受到越来越多的关注,其通过模拟真实世界场景,有望显著提升患者安全性、加速研发进程、降低成本,并为更广泛的医疗健康科学知识做出贡献。现有研究多集中于利用电子健康记录(EHR)来支持临床试验结果预测。但由于仅使用有限的临床试验结果数据进行训练,现有方法往往难以实现精准预测。部分研究尝试生成电子健康记录以增强模型开发,但在针对个体患者特征进行个性化生成方面仍显不足。近期,大语言模型的出现带来了新的可能性,其内嵌的全面临床知识已被证明有助于解决医学问题。本文提出一种基于大语言模型的数字孪生构建方法,称为TWIN-GPT。TWIN-GPT能够在有限数据条件下建立医疗信息的跨数据集关联,为不同患者生成独特的个性化数字孪生,从而保留个体患者的特征。综合实验表明,利用TWIN-GPT创建的数字孪生能够有效提升临床试验结果预测的准确性,超越以往多种预测方法。