BERT-based models for Electronic Health Records (EHR) have surged in popularity following the release of BEHRT and Med-BERT. Subsequent models have largely built on these foundations despite the fundamental design choices of these pioneering models remaining underexplored. To address this issue, we introduce CORE-BEHRT, a Carefully Optimized and Rigorously Evaluated BEHRT. Through incremental optimization, we isolate the sources of improvement for key design choices, giving us insights into the effect of data representation and individual technical components on performance. Evaluating this across a set of generic tasks (death, pain treatment, and general infection), we showed that improving data representation can increase the average downstream performance from 0.785 to 0.797 AUROC, primarily when including medication and timestamps. Improving the architecture and training protocol on top of this increased average downstream performance to 0.801 AUROC. We then demonstrated the consistency of our optimization through a rigorous evaluation across 25 diverse clinical prediction tasks. We observed significant performance increases in 17 out of 25 tasks and improvements in 24 tasks, highlighting the generalizability of our findings. Our findings provide a strong foundation for future work and aim to increase the trustworthiness of BERT-based EHR models.
翻译:在BEHRT和Med-BERT发布之后,基于BERT的电子健康记录(EHR)模型迅速流行起来。后续模型大多建立在这些基础之上,尽管这些开创性模型的基本设计选择仍未得到充分探索。为解决这一问题,我们提出了CORE-BEHRT——一种经过精心优化与严格评估的BEHRT。通过增量优化,我们分离了关键设计选择带来的改进来源,从而深入理解数据表示和各个技术组件对性能的影响。在一组通用任务(死亡、疼痛治疗和一般感染)上的评估表明,改进数据表示可将下游平均性能从0.785 AUROC提升至0.797 AUROC,尤其是在纳入用药信息和时间戳时。在此基础上改进架构和训练协议后,下游平均性能进一步增至0.801 AUROC。随后,我们通过25项多样化临床预测任务的严格评估,证明了优化的稳定性。在全部25项任务中,有17项观察到显著性能提升,24项任务性能有所改善,凸显了研究结果的泛化能力。我们的发现为后续研究奠定了坚实基础,并旨在增强基于BERT的EHR模型的可信度。