Making the most use of abundant information in electronic health records (EHR) is rapidly becoming an important topic in the medical domain. Recent work presented a promising framework that embeds entire features in raw EHR data regardless of its form and medical code standards. The framework, however, only focuses on encoding EHR with minimal preprocessing and fails to consider how to learn efficient EHR representation in terms of computation and memory usage. In this paper, we search for a versatile encoder not only reducing the large data into a manageable size but also well preserving the core information of patients to perform diverse clinical tasks. We found that hierarchically structured Convolutional Neural Network (CNN) often outperforms the state-of-the-art model on diverse tasks such as reconstruction, prediction, and generation, even with fewer parameters and less training time. Moreover, it turns out that making use of the inherent hierarchy of EHR data can boost the performance of any kind of backbone models and clinical tasks performed. Through extensive experiments, we present concrete evidence to generalize our research findings into real-world practice. We give a clear guideline on building the encoder based on the research findings captured while exploring numerous settings.
翻译:充分利用电子健康记录(EHR)中丰富的信息正迅速成为医学领域的重要课题。近期研究提出了一种有前景的框架,能够嵌入原始EHR数据中的所有特征,无论其形式或医学编码标准如何。然而,该框架仅侧重于以最小预处理方式编码EHR,未能考虑如何在计算和内存使用方面学习高效的EHR表示。本文中,我们探索一种通用的编码器,既能将海量数据压缩至可管理规模,又能良好保留患者核心信息以执行多样化的临床任务。我们发现,分层结构的卷积神经网络(CNN)在重构、预测和生成等多样化任务中常优于最先进的模型,即使参数量更少、训练时间更短。此外,利用EHR数据固有的层次结构可提升任何骨干模型和临床任务的性能。通过大量实验,我们提供了具体证据,将研究成果推广至实际应用。基于探索多种设置时获得的研究发现,我们给出了构建编码器的明确指导原则。