One Fits All:Power General Time Series Analysis by Pretrained LM

Although we have witnessed great success of pre-trained models in natural language processing (NLP) and computer vision (CV), limited progress has been made for general time series analysis. Unlike NLP and CV where a unified model can be used to perform different tasks, specially designed approach still dominates in each time series analysis task such as classification, anomaly detection, forecasting, and few-shot learning. The main challenge that blocks the development of pre-trained model for time series analysis is the lack of a large amount of data for training. In this work, we address this challenge by leveraging language or CV models, pre-trained from billions of tokens, for time series analysis. Specifically, we refrain from altering the self-attention and feedforward layers of the residual blocks in the pre-trained language or image model. This model, known as the Frozen Pretrained Transformer (FPT), is evaluated through fine-tuning on all major types of tasks involving time series. Our results demonstrate that pre-trained models on natural language or images can lead to a comparable or state-of-the-art performance in all main time series analysis tasks, as illustrated in Figure~\ref{fig:representation}. We also found both theoretically and empirically that the self-attention module behaviors similarly to principle component analysis (PCA), an observation that helps explains how transformer bridges the domain gap and a crucial step towards understanding the universality of a pre-trained transformer. The code is publicly available at https://anonymous.4open.science/r/Pretrained-LM-for-TSForcasting-C561.

翻译：尽管预训练模型在自然语言处理和计算机视觉领域取得了巨大成功，但在通用时间序列分析方面的进展仍然有限。与自然语言处理和计算机视觉中可使用统一模型执行不同任务不同，专门设计的方法仍在各类时间序列分析任务（如分类、异常检测、预测和少样本学习）中占据主导地位。阻碍时间序列分析预训练模型发展的主要挑战在于缺乏大规模训练数据。本研究通过利用基于数十亿词元预训练的语言或视觉模型来解决这一挑战。具体而言，我们避免修改预训练语言或图像模型中残差块的自注意力层和前馈层。该模型称为冻结预训练Transformer（FPT），通过在涉及时间序列的所有主要任务类型上进行微调来评估。结果表明，基于自然语言或图像的预训练模型能在所有主要时间序列分析任务中取得可比或最优性能（见图\ref{fig:representation}）。我们从理论和实验两方面发现，自注意力模块的行为类似于主成分分析（PCA），这一观察有助于解释Transformer如何弥合领域鸿沟，并是理解预训练Transformer通用性的关键步骤。代码已在https://anonymous.4open.science/r/Pretrained-LM-for-TSForcasting-C561 公开。