A Multi-Center Study on the Adaptability of a Shared Foundation Model for Electronic Health Records

Foundation models hold promise for transforming AI in healthcare by providing modular components that are easily adaptable to downstream healthcare tasks, making AI development more scalable and cost-effective. Structured EHR foundation models, trained on coded medical records from millions of patients, demonstrated benefits including increased performance with fewer training labels, and improved robustness to distribution shifts. However, questions remain on the feasibility of sharing these models across different hospitals and their performance for local task adaptation. This multi-center study examined the adaptability of a recently released structured EHR foundation model ($FM_{SM}$), trained on longitudinal medical record data from 2.57M Stanford Medicine patients. Experiments were conducted using EHR data at The Hospital for Sick Children and MIMIC-IV. We assessed both adaptability via continued pretraining on local data, and task adaptability compared to baselines of training models from scratch at each site, including a local foundation model. We evaluated the performance of these models on 8 clinical prediction tasks. In both datasets, adapting the off-the-shelf $FM_{SM}$ matched the performance of GBM models locally trained on all data while providing a 13% improvement in settings with few task-specific training labels. With continued pretraining on local data, label efficiency substantially improved, such that $FM_{SM}$ required fewer than 1% of training examples to match the fully trained GBM's performance. Continued pretraining was also 60 to 90% more sample-efficient than training local foundation models from scratch. Our findings show that adapting shared EHR foundation models across hospitals provides improved prediction performance at less cost, underscoring the utility of base foundation models as modular components to streamline the development of healthcare AI.

翻译：基础模型有望通过提供易于适应下游医疗任务的模块化组件，推动人工智能在医疗领域的变革，使AI开发更具可扩展性和成本效益。基于结构化电子健康记录（EHR）的基础模型，在数百万患者编码医疗记录上训练，展现出优势，包括在较少训练标签下提升性能，以及对分布偏移的鲁棒性增强。然而，关于这些模型在不同医院间共享的可行性及其在本地任务适应中的表现仍存疑问。这项多中心研究考察了近期发布的结构化EHR基础模型（$FM_{SM}$）的适应性，该模型基于斯坦福医学中心257万名患者的纵向医疗记录数据训练。实验采用多伦多病童医院和MIMIC-IV的EHR数据进行。我们评估了通过本地数据持续预训练的模型适应性，以及任务适应性，并与各站点从头训练模型（包括本地基础模型）的基线进行比较。我们在8项临床预测任务上评估了这些模型的性能。在两个数据集中，直接采用现成的$FM_{SM}$模型与基于所有数据本地训练的GBM模型性能相当，同时在任务特定训练标签稀缺的场景中实现了13%的性能提升。通过本地数据持续预训练，标签效率显著提高，$FM_{SM}$仅需不到1%的训练样本即可达到全训练GBM的性能。持续预训练比从头训练本地基础模型的样本效率高出60%至90%。研究结果表明，跨医院共享EHR基础模型能以更低成本实现更优的预测性能，凸显了基础模型作为模块化组件在简化医疗AI开发中的实用价值。