Recent research on time-series self-supervised models shows great promise in learning semantic representations. However, it has been limited to small-scale datasets, e.g., thousands of temporal sequences. In this work, we make key technical contributions that are tailored to the numerical properties of time-series data and allow the model to scale to large datasets, e.g., millions of temporal sequences. We adopt the Transformer architecture by first partitioning the input into non-overlapping windows. Each window is then characterized by its normalized shape and two scalar values denoting the mean and standard deviation within each window. To embed scalar values that may possess arbitrary numerical amplitudes in a high-dimensional space, we propose a numerically multi-scaled embedding module enumerating all possible numerical scales for the scalars. The model undergoes pretraining with a simple contrastive objective on a large-scale dataset over a million sequences collected by merging existing public data. We study its transfer performance on a number of univariate and multivariate classification tasks, few shot learning, unsupervised clustering and anomaly detection benchmarks. Our method exhibits remarkable improvement against previous pretraining approaches and establishes the new state of the art, even compared with domain-specific non-learning-based methods. Code is available at: \url{https://github.com/chenguolin/NuTime}.
翻译:近期关于时间序列自监督模型的研究在语义表示学习方面展现出巨大潜力。然而,现有方法仅限于小规模数据集,例如数千条时间序列。在本工作中,我们针对时间序列数据的数值特性提出了关键技术改进,使模型能够扩展至大规模数据集,例如数百万条时间序列。我们采用Transformer架构,首先将输入划分为非重叠窗口。每个窗口通过其归一化形态及两个标量值(分别表示窗口内的均值和标准差)进行表征。为了将可能具有任意数值幅度的标量值嵌入高维空间,我们提出了数值多尺度嵌入模块,该模块枚举了标量所有可能的数值尺度。模型在合并现有公开数据收集的超过百万条序列的大规模数据集上,通过简单的对比目标进行预训练。我们在多个单变量与多变量分类任务、少样本学习、无监督聚类和异常检测基准上研究了其迁移性能。我们的方法相较于以往的预训练方法展现出显著提升,并建立了新的最优性能,即使与领域特定的非学习方法相比也具备优势。代码发布于:\url{https://github.com/chenguolin/NuTime}。