We propose an approach for adapting the DeBERTa model for electronic health record (EHR) tasks using domain adaptation. We pretrain a small DeBERTa model on a dataset consisting of MIMIC-III discharge summaries, clinical notes, radiology reports, and PubMed abstracts. We compare this model's performance with a DeBERTa model pre-trained on clinical texts from our institutional EHR (MeDeBERTa) and an XGBoost model. We evaluate performance on three benchmark tasks for emergency department outcomes using the MIMIC-IV-ED dataset. We preprocess the data to convert it into text format and generate four versions of the original datasets to compare data processing and data inclusion. The results show that our proposed approach outperforms the alternative models on two of three tasks (p<0.001) and matches performance on the third task, with the use of descriptive columns improving performance over the original column names.
翻译:我们提出一种通过领域自适应将DeBERTa模型应用于电子健康记录(EHR)任务的方法。我们基于MIMIC-III出院小结、临床笔记、放射学报告和PubMed摘要组成的数据集,对一个小型DeBERTa模型进行预训练。将该模型的性能与基于我们机构EHR的临床文本预训练的DeBERTa模型(MeDeBERTa)以及XGBoost模型进行对比。利用MIMIC-IV-ED数据集,我们针对急诊科结局的三项基准任务评估性能。通过将数据预处理为文本格式,我们生成原始数据集的四个版本,以比较数据处理与数据包含策略的效果。结果表明,在两项任务上(p<0.001),我们提出的方法优于其他模型,在第三项任务上性能持平,且使用描述性列名相比原始列名能进一步提升模型表现。