Spatio-temporal reasoning in time series involves the explicit synthesis of temporal dynamics, spatial dependencies, and textual context. This capability is vital for high-stakes decision-making in systems such as traffic networks, power grids, and disease propagation. However, the field remains underdeveloped because most existing works prioritize predictive accuracy over reasoning. To address the gap, we introduce ST-Bench, a benchmark consisting of four core tasks, including etiological reasoning, entity identification, correlation reasoning, and in-context forecasting, developed via a network SDE-based multi-agent data synthesis pipeline. We then propose STReasoner, which empowers LLM to integrate time series, graph structure, and text for explicit reasoning. To promote spatially grounded logic, we introduce S-GRPO, a reinforcement learning algorithm that rewards performance gains specifically attributable to spatial information. Experiments show that STReasoner achieves average accuracy gains between 17% and 135% at only 0.004X the cost of proprietary models and generalizes robustly to real-world data.
翻译:时间序列中的时空推理涉及对时间动态、空间依赖性和文本上下文的显式综合。这种能力对于交通网络、电网和疾病传播等高风险决策系统至关重要。然而,该领域仍不成熟,因为现有工作大多优先考虑预测准确性而非推理。为弥补这一不足,我们引入了ST-Bench基准,该基准包含病因推理、实体识别、关联推理和上下文预测四项核心任务,通过一个基于网络随机微分方程的多智能体数据合成流程开发。随后,我们提出了STReasoner,它赋能大语言模型整合时间序列、图结构和文本以进行显式推理。为促进基于空间的逻辑推理,我们引入了S-GRPO,这是一种强化学习算法,专门奖励可归因于空间信息的性能提升。实验表明,STReasoner实现了17%至135%的平均准确率提升,而成本仅为专有模型的0.004倍,并能稳健地泛化到真实世界数据。