Time series is a pervasive data type across various application domains, rendering the reasonable solving of diverse time series tasks a long-standing goal. Recent advances in large language models (LLMs), especially their reasoning abilities unlocked through reinforcement learning (RL), have opened new opportunities for tackling tasks with long Chain-of-Thought (CoT) reasoning. However, leveraging LLM reasoning for time series remains in its infancy, hindered by the absence of carefully curated time series CoT data for training, limited data efficiency caused by underexplored data scheduling, and the lack of RL algorithms tailored for exploiting such time series CoT data. In this paper, we introduce VeriTime, a framework that tailors LLMs for time series reasoning through data synthesis, data scheduling, and RL training. First, we propose a data synthesis pipeline that constructs a TS-text multimodal dataset with process-verifiable annotations. Second, we design a data scheduling mechanism that arranges training samples according to a principled hierarchy of difficulty and task taxonomy. Third, we develop a two-stage reinforcement finetuning featuring fine-grained, multi-objective rewards that leverage verifiable process-level CoT data. Extensive experiments show that VeriTime substantially boosts LLM performance across diverse time series reasoning tasks. Notably, it enables compact 3B, 4B models to achieve reasoning capabilities on par with or exceeding those of larger proprietary LLMs.
翻译:时序数据是跨多个应用领域的普遍数据类型,使得合理解决各类时序任务成为长期目标。近期大语言模型(LLMs)的进展,特别是通过强化学习(RL)解锁的推理能力,为处理需要长链思维(CoT)推理的任务开辟了新途径。然而,利用LLM进行时序推理仍处于起步阶段,主要受限于缺乏精心构建的时序CoT训练数据、数据调度机制探索不足导致的数据效率低下,以及缺少专门针对此类时序CoT数据开发的RL算法。本文提出VeriTime框架,通过数据合成、数据调度和RL训练,为时序推理定制LLM。首先,我们设计了数据合成流程,构建具有过程可验证标注的时序-文本多模态数据集。其次,我们开发了数据调度机制,依据难度层级与任务分类学的原则性框架组织训练样本。最后,我们提出两阶段强化微调方法,利用可验证的过程级CoT数据设计细粒度多目标奖励函数。大量实验表明,VeriTime显著提升了LLM在多种时序推理任务上的性能。值得注意的是,该框架使紧凑的3B、4B模型能够达到与更大规模专有LLM相当甚至更优的推理能力。