Recent studies have shown great promise in unsupervised representation learning (URL) for multivariate time series, because URL has the capability in learning generalizable representation for many downstream tasks without using inaccessible labels. However, existing approaches usually adopt the models originally designed for other domains (e.g., computer vision) to encode the time series data and rely on strong assumptions to design learning objectives, which limits their ability to perform well. To deal with these problems, we propose a novel URL framework for multivariate time series by learning time-series-specific shapelet-based representation through a popular contrasting learning paradigm. To the best of our knowledge, this is the first work that explores the shapelet-based embedding in the unsupervised general-purpose representation learning. A unified shapelet-based encoder and a novel learning objective with multi-grained contrasting and multi-scale alignment are particularly designed to achieve our goal, and a data augmentation library is employed to improve the generalization. We conduct extensive experiments using tens of real-world datasets to assess the representation quality on many downstream tasks, including classification, clustering, and anomaly detection. The results demonstrate the superiority of our method against not only URL competitors, but also techniques specially designed for downstream tasks. Our code has been made publicly available at https://github.com/real2fish/CSL.
翻译:近期研究表明,无监督表示学习在多元时间序列领域展现出巨大潜力,因其无需依赖难以获取的标签即可学习普适性表示以支持多种下游任务。然而,现有方法通常采用源自其他领域(如计算机视觉)的模型对时间序列数据进行编码,并基于强假设设计学习目标,这限制了其性能表现。为解决上述问题,我们提出一种新型无监督表示学习框架,通过流行的对比学习范式学习时间序列特有的形状特征表示。据我们所知,这是首个探索在无监督通用表示学习中应用形状特征嵌入的工作。我们专门设计了统一的形状特征编码器、融合多粒度对比与多尺度对齐的新型学习目标,并采用数据增强库提升泛化能力。通过使用数十个真实数据集开展大量实验,我们在分类、聚类和异常检测等下游任务中评估了表示质量。结果表明,本方法不仅优于其他无监督表示学习竞争对手,甚至超越专为下游任务设计的技术。我们的代码已开源至https://github.com/real2fish/CSL。