Large Language Models (LLMs) offer the potential for automatic time series analysis and reporting, which is a critical task across many domains, spanning healthcare, finance, climate, energy, and many more. In this paper, we propose a framework for rigorously evaluating the capabilities of LLMs on time series understanding, encompassing both univariate and multivariate forms. We introduce a comprehensive taxonomy of time series features, a critical framework that delineates various characteristics inherent in time series data. Leveraging this taxonomy, we have systematically designed and synthesized a diverse dataset of time series, embodying the different outlined features. This dataset acts as a solid foundation for assessing the proficiency of LLMs in comprehending time series. Our experiments shed light on the strengths and limitations of state-of-the-art LLMs in time series understanding, revealing which features these models readily comprehend effectively and where they falter. In addition, we uncover the sensitivity of LLMs to factors including the formatting of the data, the position of points queried within a series and the overall time series length.
翻译:大型语言模型(LLMs)为自动时间序列分析与报告提供了潜力,这在医疗、金融、气候、能源等多个领域的核心任务中至关重要。本文提出一个严格评估LLMs在时间序列理解(涵盖单变量与多变量形式)能力的框架。我们引入了一个全面的时间序列特征分类体系,该关键框架清晰界定了时间序列数据中固有的各类特性。基于此分类法,我们系统设计并合成了一个多样化的时间序列数据集,体现了所列举的不同特征。该数据集为评估LLMs在理解时间序列方面的能力奠定了坚实基础。我们的实验揭示了前沿LLMs在时间序列理解上的优势与局限性,阐明了这些模型能有效理解哪些特征以及在哪些方面表现不足。此外,我们还发现LLMs对数据格式、序列中查询点的位置以及整体时间序列长度等因素具有敏感性。