Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based methods have shown advanced generative capabilities. However, they primarily target specific application scenarios like imputation and forecasting, leaving a gap in leveraging diffusion models for generic TSRL. Our work, Time Series Diffusion Embedding (TSDE), bridges this gap as the first diffusion-based SSL TSRL approach. TSDE segments TS data into observed and masked parts using an Imputation-Interpolation-Forecasting (IIF) mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. We train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Extensive experiments demonstrate TSDE's superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering. We also conduct an ablation study, present embedding visualizations, and compare inference speed, further substantiating TSDE's efficiency and validity in learning representations of TS data.
翻译:时间序列表示学习(TSRL)旨在为各种时间序列(TS)建模任务生成信息丰富的表示。传统TSRL中的自监督学习(SSL)方法主要分为四类:重建型、对抗型、对比型和预测型,它们都面临对噪声和复杂数据细节敏感的共性挑战。近年来,基于扩散的方法展现出先进的生成能力,但其主要针对插补和预测等特定应用场景,在利用扩散模型进行通用TSRL方面存在空白。我们的工作——时间序列扩散嵌入(TSDE)——作为首个基于扩散的SSL TSRL方法填补了这一空白。TSDE采用插补-插值-预测(IIF)掩码将TS数据分割为观测部分和掩码部分。它通过一种可训练的嵌入函数(该函数采用含交叉机制的双正交Transformer编码器)处理观测部分。我们训练一个以嵌入为条件的反向扩散过程,旨在预测添加到掩码部分的噪声。大量实验证明了TSDE在插补、插值、预测、异常检测、分类和聚类任务中的优越性。我们还进行了消融研究、呈现嵌入可视化结果并比较推理速度,进一步证实了TSDE在TS数据表示学习中的高效性和有效性。