TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models

Recent advances in multimodal time series learning underscore a paradigm shift from analytics centered on basic patterns toward advanced time series understanding and reasoning. However, existing multimodal time series datasets mostly remain at the level of surface alignment and question answering, without reaching the depth of genuine reasoning. The absence of well-defined tasks that genuinely require time series reasoning, along with the scarcity of high-quality data, has limited progress in building practical time series reasoning models (TSRMs). To this end, we introduce Time Series Reasoning Suite (TSR-Suite), which formalizes four atomic tasks that span three fundamental capabilities for reasoning with time series: (1) perception, acquired through scenario understanding and causality discovery; (2) extrapolation, realized via event-aware forecasting; and (3) decision-making, developed through deliberation over perception and extrapolation. TSR-Suite is the first comprehensive time series reasoning suite that supports not only thorough evaluation but also the data pipeline and training of TSRMs. It contains more than 23K samples, of which 2.3K are carefully curated through a human-guided hierarchical annotation process. Building on this foundation, we introduce TimeOmni-1, the first unified reasoning model designed to address diverse real-world problems demanding time series reasoning. The model is trained in multiple stages, integrating a mixture of task scenarios, novel reward functions, and tailored optimizations. Experiments show that TimeOmni-1 delivers strong out-of-distribution generalization across all tasks and achieves a high rate of valid responses. It significantly improves causality discovery accuracy (64.0% vs. 35.9% with GPT-4.1) and raises the valid response rate by over 6% compared to GPT-4.1 on the event-aware forecasting task.

翻译：近年来，多模态时间序列学习的进展突显了从以基本模式为中心的分析向高级时间序列理解与推理的范式转变。然而，现有的多模态时间序列数据集大多停留在表层对齐和问答层面，未能触及真正推理的深度。由于缺乏真正需要时间序列推理的明确定义任务，以及高质量数据的稀缺，构建实用的时间序列推理模型（TSRM）的进展受到限制。为此，我们引入了时间序列推理套件（TSR-Suite），它形式化了四项原子任务，涵盖时间序列推理的三种基本能力：（1）感知，通过场景理解和因果发现获得；（2）外推，通过事件感知预测实现；以及（3）决策，通过对感知和外推的深思熟虑来发展。TSR-Suite是首个全面的时间序列推理套件，不仅支持彻底的评估，还支持TSRM的数据流水线和训练。它包含超过23K个样本，其中2.3K个样本是通过人工引导的分层标注过程精心策划的。在此基础上，我们提出了TimeOmni-1，这是首个旨在解决需要时间序列推理的多样化现实世界问题的统一推理模型。该模型通过多阶段训练，融合了多种任务场景、新颖的奖励函数和定制化的优化策略。实验表明，TimeOmni-1在所有任务上均表现出强大的分布外泛化能力，并实现了较高的有效响应率。与GPT-4.1相比，它显著提高了因果发现准确率（64.0% vs. 35.9%），并在事件感知预测任务上将有效响应率提升了6%以上。