Multivariate time-series data in numerous real-world applications (e.g., healthcare and industry) are informative but challenging due to the lack of labels and high dimensionality. Recent studies in self-supervised learning have shown their potential in learning rich representations without relying on labels, yet they fall short in learning disentangled embeddings and addressing issues of inductive bias (e.g., transformation-invariance). To tackle these challenges, we propose TimeDRL, a generic multivariate time-series representation learning framework with disentangled dual-level embeddings. TimeDRL is characterized by three novel features: (i) disentangled derivation of timestamp-level and instance-level embeddings from patched time-series data using a [CLS] token strategy; (ii) utilization of timestamp-predictive and instance-contrastive tasks for disentangled representation learning, with the former optimizing timestamp-level embeddings with predictive loss, and the latter optimizing instance-level embeddings with contrastive loss; and (iii) avoidance of augmentation methods to eliminate inductive biases, such as transformation-invariance from cropping and masking. Comprehensive experiments on 6 time-series forecasting datasets and 5 time-series classification datasets have shown that TimeDRL consistently surpasses existing representation learning approaches, achieving an average improvement of forecasting by 57.98% in MSE and classification by 1.25% in accuracy. Furthermore, extensive ablation studies confirmed the relative contribution of each component in TimeDRL's architecture, and semi-supervised learning evaluations demonstrated its effectiveness in real-world scenarios, even with limited labeled data.
翻译:多元时间序列数据在众多现实应用(如医疗和工业)中信息丰富但极具挑战性,原因在于标签的缺乏和高维特性。近期自监督学习研究已展现出无需依赖标签即可学习丰富表征的潜力,但在学习解耦嵌入及解决归纳偏置(如变换不变性)问题方面仍存在不足。为应对这些挑战,我们提出TimeDRL——一种具有解耦双层嵌入的通用多元时间序列表示学习框架。TimeDRL具备三大创新特性:(i) 利用[CLS]标记策略从分块时间序列数据中解耦推导时间戳级和实例级嵌入;(ii) 采用时间戳预测任务与实例对比任务实现解耦表示学习,前者通过预测损失优化时间戳级嵌入,后者通过对比损失优化实例级嵌入;(iii) 避免使用数据增强方法以消除归纳偏置(如裁剪和掩码导致的变换不变性)。在6个时间序列预测数据集和5个时间序列分类数据集上的全面实验表明,TimeDRL始终优于现有表示学习方法,预测任务MSE平均降低57.98%,分类任务准确率平均提升1.25%。此外,大量消融实验验证了TimeDRL架构中各组成部分的相关贡献,半监督学习评估则证明了其在真实场景中即使标注数据有限仍具有有效性。