Time series analysis is widely used in extensive areas. Recently, to reduce labeling expenses and benefit various tasks, self-supervised pre-training has attracted immense interest. One mainstream paradigm is masked modeling, which successfully pre-trains deep models by learning to reconstruct the masked content based on the unmasked part. However, since the semantic information of time series is mainly contained in temporal variations, the standard way of randomly masking a portion of time points will seriously ruin vital temporal variations of time series, making the reconstruction task too difficult to guide representation learning. We thus present SimMTM, a Simple pre-training framework for Masked Time-series Modeling. By relating masked modeling to manifold learning, SimMTM proposes to recover masked time points by the weighted aggregation of multiple neighbors outside the manifold, which eases the reconstruction task by assembling ruined but complementary temporal variations from multiple masked series. SimMTM further learns to uncover the local structure of the manifold, which is helpful for masked modeling. Experimentally, SimMTM achieves state-of-the-art fine-tuning performance compared to the most advanced time series pre-training methods in two canonical time series analysis tasks: forecasting and classification, covering both in- and cross-domain settings.
翻译:时间序列分析广泛应用于众多领域。近年来,为降低标注成本并惠及各类任务,自监督预训练引起了极大关注。其中一种主流范式是掩码建模,该方法通过学习基于未掩码部分重建掩码内容,成功预训练了深度模型。然而,由于时间序列的语义信息主要蕴含在时间变化中,标准做法随机掩码部分时间点会严重破坏时间序列的关键时间变化,使得重建任务过于困难,难以引导表示学习。为此,我们提出SimMTM——一个面向掩码时间序列建模的简单预训练框架。通过将掩码建模与流形学习相关联,SimMTM提出通过加权聚合流形外多个邻近点来恢复掩码时间点,这一方法通过整合来自多个掩码序列的已破坏但互补的时间变化,简化了重建任务。SimMTM进一步学习揭示流形的局部结构,这有助于掩码建模。实验表明,在预测与分类这两项典型时间序列分析任务(涵盖域内与跨域场景)中,与最先进的时间序列预训练方法相比,SimMTM在微调性能上达到了最优水平。