With the advancement of large language models (LLMs), diverse time series analysis tasks are reformulated as time series question answering (TSQA) through a unified natural language interface. However, existing LLM-based approaches largely adopt general natural language processing techniques and are prone to reasoning errors when handling complex numerical sequences. Different from purely textual tasks, time series data are inherently verifiable, enabling consistency checking between reasoning steps and the original input. Motivated by this property, we propose T3LLM, which performs multi-step reasoning with an explicit correction mechanism for time series question answering. The T3LLM framework consists of three LLMs, namely, a worker, a reviewer, and a student, that are responsible for generation, review, and reasoning learning, respectively. Within this framework, the worker generates step-wise chains of thought (CoT) under structured prompts, while the reviewer inspects the reasoning, identifies erroneous steps, and provides corrective comments. The collaboratively generated corrected CoT are used to fine-tune the student model, internalizing multi-step reasoning and self-correction into its parameters. Experiments on multiple real-world TSQA benchmarks demonstrate that T3LLM achieves state-of-the-art performance over strong LLM-based baselines.
翻译:随着大语言模型(LLM)的发展,多样化的时间序列分析任务通过统一的自然语言接口被重构为时间序列问答(TSQA)。然而,现有基于LLM的方法大多采用通用自然语言处理技术,在处理复杂数值序列时容易产生推理错误。与纯文本任务不同,时间序列数据本质上是可验证的,这使得推理步骤与原始输入之间的一致性检查成为可能。基于这一特性,我们提出了T3LLM,该框架为时间序列问答提供了具有显式修正机制的多步推理方法。T3LLM框架包含三个LLM,分别称为工作者、审阅者和学生,各自负责生成、审阅和推理学习。在该框架中,工作者在结构化提示下生成逐步的思维链(CoT),而审阅者则检查推理过程,识别错误步骤并提供修正意见。协同生成的修正后CoT被用于微调学生模型,从而将多步推理和自我修正能力内化到其参数中。在多个真实世界TSQA基准上的实验表明,T3LLM在强LLM基线模型上实现了最先进的性能。