Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series

Self-supervised learning (SSL) for clinical time series data has received significant attention in recent literature, since these data are highly rich and provide important information about a patient's physiological state. However, most existing SSL methods for clinical time series are limited in that they are designed for unimodal time series, such as a sequence of structured features (e.g., lab values and vitals signs) or an individual high-dimensional physiological signal (e.g., an electrocardiogram). These existing methods cannot be readily extended to model time series that exhibit multimodality, with structured features and high-dimensional data being recorded at each timestep in the sequence. In this work, we address this gap and propose a new SSL method -- Sequential Multi-Dimensional SSL -- where a SSL loss is applied both at the level of the entire sequence and at the level of the individual high-dimensional data points in the sequence in order to better capture information at both scales. Our strategy is agnostic to the specific form of loss function used at each level -- it can be contrastive, as in SimCLR, or non-contrastive, as in VICReg. We evaluate our method on two real-world clinical datasets, where the time series contains sequences of (1) high-frequency electrocardiograms and (2) structured data from lab values and vitals signs. Our experimental results indicate that pre-training with our method and then fine-tuning on downstream tasks improves performance over baselines on both datasets, and in several settings, can lead to improvements across different self-supervised loss functions.

翻译：自监督学习（SSL）在临床时间序列数据领域近年来备受关注，因这类数据蕴含丰富信息，能反映患者生理状态的关键特征。然而，现有临床时间序列SSL方法大多存在局限性：它们仅为单模态时间序列设计，例如结构化特征序列（如实验室指标和生命体征）或单一高维生理信号（如心电图）。这类方法难以直接扩展至建模具有多模态特征的时间序列——即序列中每个时间步骤均包含结构化特征与高维数据的复合记录。为填补这一空白，本文提出新型SSL方法——序列化多维自监督学习（Sequential Multi-Dimensional SSL），通过在整体序列层面与序列内独立高维数据点层面分别施加SSL损失，以更有效捕获双尺度信息。该策略不依赖特定损失函数形式（可选用对比学习如SimCLR，或非对比学习如VICReg）。我们在两个真实临床数据集上验证方法有效性，其中时间序列分别包含：(1)高频心电图序列；(2)实验室指标与生命体征的结构化数据序列。实验结果表明，采用本方法进行预训练并针对下游任务微调后，两个数据集的基线性能均得到提升，且在多种设置下可显著改善不同自监督损失函数的训练效果。