Time Series Forecasting (TSF) has long been a challenge in time series analysis. Inspired by the success of Large Language Models (LLMs), researchers are now developing Large Time Series Models (LTSMs)-universal transformer-based models that use autoregressive prediction-to improve TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datasets. Recent endeavors have studied and evaluated various design choices aimed at enhancing LTSM training and generalization capabilities. However, these design choices are typically studied and evaluated in isolation and are not benchmarked collectively. In this work, we introduce LTSM-Bundle, a comprehensive toolbox, and benchmark for training LTSMs, spanning pre-processing techniques, model configurations, and dataset configuration. It modularized and benchmarked LTSMs from multiple dimensions, encompassing prompting strategies, tokenization approaches, training paradigms, base model selection, data quantity, and dataset diversity. Furthermore, we combine the most effective design choices identified in our study. Empirical results demonstrate that this combination achieves superior zero-shot and few-shot performances compared to state-of-the-art LTSMs and traditional TSF methods on benchmark datasets.
翻译:时间序列预测(TSF)长期以来一直是时间序列分析领域的一项挑战。受大语言模型(LLMs)成功的启发,研究人员正在开发大型时间序列模型(LTSMs)——一种基于Transformer的通用模型,采用自回归预测来改进TSF。然而,在异构时间序列数据上训练LTSMs面临着独特的挑战,包括跨数据集的不同频率、维度和模式。近期的研究探索并评估了旨在增强LTSM训练和泛化能力的多种设计方案。然而,这些设计方案通常被孤立地研究和评估,缺乏统一的基准测试。在本工作中,我们提出了LTSM-Bundle,一个用于训练LTSMs的综合性工具箱与基准测试框架,涵盖预处理技术、模型配置和数据集配置。该框架从多个维度对LTSMs进行了模块化设计与基准测试,包括提示策略、标记化方法、训练范式、基础模型选择、数据量以及数据集多样性。此外,我们整合了研究中识别出的最有效的设计方案。实证结果表明,在基准数据集上,这种整合方案相较于最先进的LTSMs和传统TSF方法,实现了更优的零样本和少样本性能。