In research areas with scarce data, representation learning plays a significant role. This work aims to enhance representation learning for clinical time series by deriving universal embeddings for clinical features, such as heart rate and blood pressure. We use self-supervised training paradigms for language models to learn high-quality clinical feature embeddings, achieving a finer granularity than existing time-step and patient-level representation learning. We visualize the learnt embeddings via unsupervised dimension reduction techniques and observe a high degree of consistency with prior clinical knowledge. We also evaluate the model performance on the MIMIC-III benchmark and demonstrate the effectiveness of using clinical feature embeddings. We publish our code online for replication.
翻译:在数据稀缺的研究领域,表征学习发挥着重要作用。本研究旨在通过推导临床特征(如心率和血压)的通用嵌入,增强临床时间序列的表征学习。我们采用语言模型的自监督训练范式来学习高质量的临床特征嵌入,实现了比现有时间步长和患者级别表征学习更细的粒度。通过无监督降维技术可视化学习到的嵌入,我们观察到其与先验临床知识高度一致。我们还基于MIMIC-III基准评估模型性能,验证了临床特征嵌入的有效性。为便于复现,我们在网上公开了相关代码。