EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

While the general machine learning (ML) community has benefited from public datasets, tasks, and models, the progress of ML in healthcare has been hampered by a lack of such shared assets. The success of foundation models creates new challenges for healthcare ML by requiring access to shared pretrained models to validate performance benefits. We help address these challenges through three contributions. First, we publish a new dataset, EHRSHOT, which contains deidentified structured data from the electronic health records (EHRs) of 6,739 patients from Stanford Medicine. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and not restricted to ICU/ED patients. Second, we publish the weights of CLMBR-T-base, a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR. We provide an end-to-end pipeline for the community to validate and build upon its performance. Third, we define 15 few-shot clinical prediction tasks, enabling evaluation of foundation models on benefits such as sample efficiency and task adaptation. Our model and dataset are available via a research data use agreement from our website: https://ehrshot.stanford.edu. Code to reproduce our results are available at our Github repo: https://github.com/som-shahlab/ehrshot-benchmark

翻译：虽然通用机器学习社区受益于公共数据集、任务和模型，但医疗领域的机器学习进展因缺乏此类共享资产而受阻。基础模型的成功为医疗机器学习带来了新挑战，要求访问共享的预训练模型以验证性能提升。我们通过三项贡献帮助应对这些挑战。首先，我们发布了一个新数据集EHRSHOT，其中包含来自斯坦福医学中心6,739名患者的电子健康记录（EHR）的去标识化结构化数据。与MIMIC-III/IV及其他流行EHR数据集不同，EHRSHOT是纵向数据，且不局限于重症监护室/急诊室患者。其次，我们发布了CLMBR-T-base的权重，这是一个包含1.41亿参数的临床基础模型，在257万名患者的结构化EHR数据上预训练。我们是首批完全发布此类用于编码EHR数据的模型的团队之一；相比之下，先前大多数用于临床数据的模型（如GatorTron、ClinicalBERT）仅适用于非结构化文本，无法处理EHR内部丰富的结构化数据。我们提供了端到端流程，供社区验证并基于其性能进行改进。第三，我们定义了15项临床少样本预测任务，使得能够评估基础模型在样本效率和任务适应等方面的优势。我们的模型和数据集可通过研究数据使用协议从我们的网站获取：https://ehrshot.stanford.edu。用于复现我们结果的代码可在我们的GitHub仓库获取：https://github.com/som-shahlab/ehrshot-benchmark

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/