Large Language Models (LLMs) have seen significant use in domains such as natural language processing and computer vision. Going beyond text, image and graphics, LLMs present a significant potential for analysis of time series data, benefiting domains such as climate, IoT, healthcare, traffic, audio and finance. This survey paper provides an in-depth exploration and a detailed taxonomy of the various methodologies employed to harness the power of LLMs for time series analysis. We address the inherent challenge of bridging the gap between LLMs' original text data training and the numerical nature of time series data, and explore strategies for transferring and distilling knowledge from LLMs to numerical time series analysis. We detail various methodologies, including (1) direct prompting of LLMs, (2) time series quantization, (3) aligning techniques, (4) utilization of the vision modality as a bridging mechanism, and (5) the combination of LLMs with tools. Additionally, this survey offers a comprehensive overview of the existing multimodal time series and text datasets and delves into the challenges and future opportunities of this emerging field. We maintain an up-to-date Github repository which includes all the papers and datasets discussed in the survey.
翻译:大语言模型(LLMs)在自然语言处理与计算机视觉等领域已得到显著应用。超越文本、图像与图形范畴,LLMs在时间序列数据分析方面展现出巨大潜力,可惠及气候、物联网、医疗、交通、音频及金融等领域。本综述论文深入探讨了利用LLMs进行时间序列分析的各种方法,并构建了详细的方法分类体系。我们致力于解决LLMs原始文本数据训练与时间序列数值特性之间的固有鸿沟,并探索将LLMs知识迁移与蒸馏至数值时间序列分析的策略。我们详细阐述了多种方法,包括:(1)直接提示LLMs,(2)时间序列量化,(3)对齐技术,(4)利用视觉模态作为桥梁机制,以及(5)LLMs与工具的结合。此外,本综述全面概述了现有的多模态时间序列与文本数据集,并深入探讨了这一新兴领域的挑战与未来机遇。我们维护着一个持续更新的GitHub仓库,其中包含了本综述讨论的所有论文与数据集。