The effectiveness of self-supervised learning (SSL) for physiological time series depends on the ability of a pretraining objective to preserve information about the underlying physiological state while filtering out unrelated noise. However, existing strategies are limited due to reliance on heuristic principles or poorly constrained generative tasks. To address this limitation, we propose a pretraining framework that exploits the information structure of a dynamical systems generative model across multiple time-series. This framework reveals our key insight that class identity can be efficiently captured by extracting information about the generative variables related to the system parameters shared across similar time series samples, while noise unique to individual samples should be discarded. Building on this insight, we propose PULSE, a cross-reconstruction-based pretraining objective for physiological time series datasets that explicitly extracts system information while discarding non-transferrable sample-specific ones. We establish theory that provides sufficient conditions for the system information to be recovered, and empirically validate it using a synthetic dynamical systems experiment. Furthermore, we apply our method to diverse real-world datasets, demonstrating that PULSE learns representations that can broadly distinguish semantic classes, increase label efficiency, and improve transfer learning.
翻译:自监督学习在生理时间序列中的有效性取决于预训练目标能否保留潜在生理状态的信息,同时过滤掉无关噪声。然而,现有策略因依赖启发式原则或约束不足的生成任务而存在局限性。为解决这一问题,我们提出了一种预训练框架,该框架通过跨多个时间序列挖掘动力系统生成模型的信息结构。这一框架揭示了我们的关键洞见:类别特征可通过提取与跨相似时间序列样本共享的系统参数相关的生成变量信息被高效捕获,而个体样本独有的噪声应被丢弃。基于此,我们提出了PULSE——一种基于交叉重构的生理时间序列数据集预训练目标,它显式提取系统信息的同时丢弃不可迁移的样本特异性信息。我们建立了系统信息恢复的充分条件理论,并通过合成动力系统实验进行了实证验证。此外,我们将该方法应用于多样化的真实世界数据集,证明PULSE学到的表示能够广泛区分语义类别、提升标签效率并改进迁移学习。