Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting

Time series forecasting (TSF) is a fundamental and widely studied task, spanning methods from classical statistical approaches to modern deep learning and multimodal language modeling. Despite their effectiveness, these methods often follow a fast thinking paradigm emphasizing pattern extraction and direct value mapping, while overlooking explicit reasoning over temporal dynamics and contextual dependencies. Meanwhile, emerging slow-thinking LLMs (e.g., ChatGPT-o1, DeepSeek-R1) have demonstrated impressive multi-step reasoning capabilities across diverse domains, suggesting a new opportunity for reframing TSF as a structured reasoning task. This motivates a key question: can slow-thinking LLMs effectively reason over temporal patterns to support time series forecasting, even in zero-shot manner? To investigate this, in this paper, we propose TimeReasoner, an extensive empirical study that formulates TSF as a conditional reasoning task. We design a series of prompting strategies to elicit inference-time reasoning from pretrained slow-thinking LLMs and evaluate their performance across diverse TSF benchmarks. Our findings reveal that slow-thinking LLMs exhibit non-trivial zero-shot forecasting capabilities, especially in capturing high-level trends and contextual shifts. While preliminary, our study surfaces important insights into the reasoning behaviors of LLMs in temporal domains highlighting both their potential and limitations. We hope this work catalyzes further research into reasoning-based forecasting paradigms and paves the way toward more interpretable and generalizable TSF frameworks.

翻译：时间序列预测是一项基础且被广泛研究的任务，其方法涵盖从经典统计方法到现代深度学习和多模态语言建模。尽管这些方法行之有效，但它们通常遵循一种强调模式提取和直接值映射的“快思考”范式，而忽略了对时序动态和上下文依赖关系的显式推理。与此同时，新兴的慢思考大语言模型（例如 ChatGPT-o1、DeepSeek-R1）已在多个领域展现出令人印象深刻的多步推理能力，这为将时间序列预测重新构建为一个结构化推理任务提供了新的契机。这引出了一个关键问题：慢思考大语言模型能否有效地对时序模式进行推理以支持时间序列预测，甚至是以零样本的方式？为探究此问题，本文提出了 TimeReasoner，一项将时间序列预测构建为条件推理任务的广泛实证研究。我们设计了一系列提示策略，以从预训练的慢思考大语言模型中激发推理时的思考，并在多样化的时间序列预测基准上评估其性能。我们的研究结果表明，慢思考大语言模型展现出非平凡的零样本预测能力，尤其是在捕捉高层趋势和上下文变化方面。尽管是初步研究，但我们的工作揭示了大型语言模型在时序领域推理行为的重要见解，突显了其潜力与局限性。我们希望这项工作能推动基于推理的预测范式的进一步研究，并为构建更具可解释性和泛化能力的时间序列预测框架铺平道路。