Making the most use of abundant information in electronic health records (EHR) is rapidly becoming an important topic in the medical domain. Recent work presented a promising framework that embeds entire features in raw EHR data regardless of its form and medical code standards. The framework, however, only focuses on encoding EHR with minimal preprocessing and fails to consider how to learn efficient EHR representation in terms of computation and memory usage. In this paper, we search for a versatile encoder not only reducing the large data into a manageable size but also well preserving the core information of patients to perform diverse clinical tasks. We found that hierarchically structured Convolutional Neural Network (CNN) often outperforms the state-of-the-art model on diverse tasks such as reconstruction, prediction, and generation, even with fewer parameters and less training time. Moreover, it turns out that making use of the inherent hierarchy of EHR data can boost the performance of any kind of backbone models and clinical tasks performed. Through extensive experiments, we present concrete evidence to generalize our research findings into real-world practice. We give a clear guideline on building the encoder based on the research findings captured while exploring numerous settings.
翻译:充分利用电子健康记录(EHR)中的丰富信息正迅速成为医学领域的重要课题。近期研究提出了一种有前景的框架,能够嵌入原始EHR数据中的所有特征,无论其形式或医疗编码标准如何。然而,该框架仅专注于以最小预处理方式对EHR进行编码,未能考虑如何从计算和内存使用角度学习高效的EHR表示。本文中,我们寻找一种多功能编码器,不仅能将大规模数据缩减至可管理规模,还能良好保留患者核心信息,以执行多样化的临床任务。我们发现,层次化结构的卷积神经网络(CNN)在重构、预测和生成等多样化任务中常优于最先进模型,即使参数更少、训练时间更短。此外,利用EHR数据固有的层次结构可提升任何类型骨干模型及临床任务的性能。通过广泛实验,我们提供了具体证据,将研究结果推广至实际应用。基于对多种设置探索中捕获的研究发现,我们给出了构建编码器的明确指南。