In research areas with scarce data, representation learning plays a significant role. This work aims to enhance representation learning for clinical time series by deriving universal embeddings for clinical features, such as heart rate and blood pressure. We use self-supervised training paradigms for language models to learn high-quality clinical feature embeddings, achieving a finer granularity than existing time-step and patient-level representation learning. We visualize the learnt embeddings via unsupervised dimension reduction techniques and observe a high degree of consistency with prior clinical knowledge. We also evaluate the model performance on the MIMIC-III benchmark and demonstrate the effectiveness of using clinical feature embeddings. We publish our code online for replication.
翻译:在数据稀缺的研究领域,表征学习发挥着重要作用。本研究旨在通过推导心率、血压等临床特征的通用嵌入,增强临床时间序列的表征学习能力。我们采用语言模型的自监督训练范式来学习高质量的临床特征嵌入,其粒度优于现有的时间步和患者层级表征学习方法。通过无监督降维技术对所学嵌入进行可视化,我们发现其与先验临床知识具有高度一致性。我们还在MIMIC-III基准上评估了模型性能,验证了临床特征嵌入的有效性。为便于复现,我们在线发布了代码。