Large Language Models (LLMs) have seen significant use in domains such as natural language processing and computer vision. Going beyond text, image and graphics, LLMs present a significant potential for analysis of time series data, benefiting domains such as climate, IoT, healthcare, traffic, audio and finance. This survey paper provides an in-depth exploration and a detailed taxonomy of the various methodologies employed to harness the power of LLMs for time series analysis. We address the inherent challenge of bridging the gap between LLMs' original text data training and the numerical nature of time series data, and explore strategies for transferring and distilling knowledge from LLMs to numerical time series analysis. We detail various methodologies, including (1) direct prompting of LLMs, (2) time series quantization, (3) alignment techniques, (4) utilization of the vision modality as a bridging mechanism, and (5) the combination of LLMs with tools. Additionally, this survey offers a comprehensive overview of the existing multimodal time series and text datasets and delves into the challenges and future opportunities of this emerging field. We maintain an up-to-date Github repository which includes all the papers and datasets discussed in the survey.
翻译:大语言模型(LLMs)在自然语言处理和计算机视觉等领域已得到显著应用。超越文本、图像和图形,LLMs在时间序列数据分析方面展现出巨大潜力,可惠及气候、物联网、医疗、交通、音频和金融等领域。本综述论文深入探索并详细分类了利用LLMs进行时间序列分析的各种方法。我们着力解决LLMs原始文本数据训练与时间序列数据数值特性之间固有的衔接难题,并探讨将知识从LLMs迁移和提炼至数值时间序列分析的策略。我们详述了多种方法,包括:(1)直接提示LLMs,(2)时间序列量化,(3)对齐技术,(4)利用视觉模态作为桥梁机制,以及(5)LLMs与工具的结合。此外,本综述全面概述了现有多模态时间序列与文本数据集,并深入探讨了该新兴领域的挑战与未来机遇。我们维护着一个包含综述中讨论的所有论文和数据集的最新GitHub仓库。