Learning Electronic Health Records (EHRs) representation is a preeminent yet under-discovered research topic. It benefits various clinical decision support applications, e.g., medication outcome prediction or patient similarity search. Current approaches focus on task-specific label supervision on vectorized sequential EHR, which is not applicable to large-scale unsupervised scenarios. Recently, contrastive learning shows great success on self-supervised representation learning problems. However, complex temporality often degrades the performance. We propose Graph Kernel Infomax, a self-supervised graph kernel learning approach on the graphical representation of EHR, to overcome the previous problems. Unlike the state-of-the-art, we do not change the graph structure to construct augmented views. Instead, we use Kernel Subspace Augmentation to embed nodes into two geometrically different manifold views. The entire framework is trained by contrasting nodes and graph representations on those two manifold views through the commonly used contrastive objectives. Empirically, using publicly available benchmark EHR datasets, our approach yields performance on clinical downstream tasks that exceeds the state-of-the-art. Theoretically, the variation on distance metrics naturally creates different views as data augmentation without changing graph structures.
翻译:学习电子健康记录(EHR)的表示是一个重要但尚未充分探索的研究课题。它有益于多种临床决策支持应用,例如药物结果预测或患者相似性搜索。当前方法侧重于对向量化的序列化EHR进行任务特定的标签监督,这并不适用于大规模无监督场景。近年来,对比学习在自监督表示学习问题上取得了巨大成功。然而,复杂的时序性往往会降低其性能。我们提出图核信息最大化(Graph Kernel Infomax),一种基于EHR图表示的自监督图核学习方法,以克服上述问题。与现有最先进方法不同,我们并不改变图结构来构造增强视图。相反,我们使用核子空间增强将节点嵌入到两个几何上不同的流形视图中。整个框架通过在这两个流形视图上对比节点与图表示,并借助常用的对比目标进行训练。实验表明,使用公开的基准EHR数据集,我们的方法在临床下游任务上的性能超越了最先进技术。理论上,距离度量的变化自然地在不改变图结构的情况下产生了不同的数据增强视图。