Self-Supervised Time Series Representation Learning via Cross Reconstruction Transformer

Unsupervised/self-supervised representation learning in time series is critical since labeled samples are usually scarce in real-world scenarios. Existing approaches mainly leverage the contrastive learning framework, which automatically learns to understand the similar and dissimilar data pairs. Nevertheless, they are restricted to the prior knowledge of constructing pairs, cumbersome sampling policy, and unstable performances when encountering sampling bias. Also, few works have focused on effectively modeling across temporal-spectral relations to extend the capacity of representations. In this paper, we aim at learning representations for time series from a new perspective and propose Cross Reconstruction Transformer (CRT) to solve the aforementioned problems in a unified way. CRT achieves time series representation learning through a cross-domain dropping-reconstruction task. Specifically, we transform time series into the frequency domain and randomly drop certain parts in both time and frequency domains. Dropping can maximally preserve the global context compared to cropping and masking. Then a transformer architecture is utilized to adequately capture the cross-domain correlations between temporal and spectral information through reconstructing data in both domains, which is called Dropped Temporal-Spectral Modeling. To discriminate the representations in global latent space, we propose Instance Discrimination Constraint to reduce the mutual information between different time series and sharpen the decision boundaries. Additionally, we propose a specified curriculum learning strategy to optimize the CRT, which progressively increases the dropping ratio in the training process.

翻译：在现实场景中，由于标注样本通常稀缺，无监督/自监督时间序列表示学习至关重要。现有方法主要利用对比学习框架，自动学习理解相似与不相似的数据对。然而，这些方法受限于先验对的构建知识、繁琐的采样策略以及在采样偏差下不稳定的性能。此外，很少有研究关注有效建模时频域关系以扩展表示能力。本文旨在从新的视角学习时间序列表示，提出交叉重构Transformer（CRT）以统一解决上述问题。CRT通过跨域丢弃-重构任务实现时间序列表示学习。具体而言，我们将时间序列转换到频域，并在时域和频域中随机丢弃部分元素。与裁剪和掩码相比，丢弃能够最大程度保留全局上下文。随后，通过重构两个域的数据，利用Transformer架构充分捕捉时域与频域信息之间的跨域相关性，该方法称为丢弃时频建模。为在全局隐空间中区分表示，我们提出实例判别约束以降低不同时间序列之间的互信息并锐化决策边界。此外，我们设计了一种特定的课程学习策略来优化CRT，该策略在训练过程中逐步提高丢弃比例。