Although we have witnessed great success of pre-trained models in natural language processing (NLP) and computer vision (CV), limited progress has been made for general time series analysis. Unlike NLP and CV where a unified model can be used to perform different tasks, specially designed approach still dominates in each time series analysis task such as classification, anomaly detection, forecasting, and few-shot learning. The main challenge that blocks the development of pre-trained model for time series analysis is the lack of a large amount of data for training. In this work, we address this challenge by leveraging language or CV models, pre-trained from billions of tokens, for time series analysis. Specifically, we refrain from altering the self-attention and feedforward layers of the residual blocks in the pre-trained language or image model. This model, known as the Frozen Pretrained Transformer (FPT), is evaluated through fine-tuning on all major types of tasks involving time series. Our results demonstrate that pre-trained models on natural language or images can lead to a comparable or state-of-the-art performance in all main time series analysis tasks, as illustrated in Figure 1. We also found both theoretically and empirically that the self-attention module behaviors similarly to principle component analysis (PCA), an observation that helps explains how transformer bridges the domain gap and a crucial step towards understanding the universality of a pre-trained transformer.
翻译:尽管预训练模型在自然语言处理(NLP)和计算机视觉(CV)领域取得了巨大成功,但在通用时间序列分析方面的进展仍然有限。与NLP和CV中可使用统一模型执行不同任务不同,针对分类、异常检测、预测和少样本学习等每个时间序列分析任务,专门设计的方法仍占主导地位。阻碍时间序列分析预训练模型发展的主要挑战是缺乏大规模训练数据。在本工作中,我们通过利用从数十亿标记预训练的语言或CV模型来解决这一挑战。具体而言,我们避免改动预训练语言或图像模型中残差块的自注意力层和前馈层。该模型被称为冻结预训练变换器(FPT),通过在涉及时间序列的所有主要类型任务上进行微调来评估。结果表明,基于自然语言或图像的预训练模型可在所有主要时间序列分析任务中达到相当或最先进的性能,如图1所示。我们还在理论和实证中发现,自注意力模块的行为类似于主成分分析(PCA),这一观察结果有助于解释变换器如何弥合领域差距,并且是理解预训练变换器通用性的关键一步。