Time-series diagnostic reasoning is essential for many applications, yet existing solutions face a persistent gap: general reasoning large language models (GRLMs) possess strong reasoning skills but lack the domain-specific knowledge to understand complex time-series patterns. Conversely, fine-tuned time-series LLMs (TSLMs) understand these patterns but lack the capacity to generalize reasoning for more complicated questions. To bridge this gap, we propose a hybrid knowledge-injection framework that injects TSLM-generated insights directly into GRLM's reasoning trace, thereby achieving strong time-series reasoning with in-domain knowledge. As collecting data for knowledge injection fine-tuning is costly, we further leverage a reinforcement learning-based approach with verifiable rewards (RLVR) to elicit knowledge-rich traces without human supervision, then transfer such an in-domain thinking trace into GRLM for efficient knowledge injection. We further release SenTSR-Bench, a multivariate time-series-based diagnostic reasoning benchmark collected from real-world industrial operations. Across SenTSR-Bench and other public datasets, our method consistently surpasses TSLMs by 9.1%-26.1% and GRLMs by 7.9%-22.4%, delivering robust, context-aware time-series diagnostic insights.
翻译:时间序列诊断推理在许多应用中至关重要,但现有解决方案面临一个持续存在的差距:通用推理大语言模型具备强大的推理能力,但缺乏理解复杂时间序列模式的领域特定知识。相反,经过微调的时间序列大语言模型理解这些模式,但缺乏对更复杂问题进行泛化推理的能力。为弥合这一差距,我们提出了一种混合知识注入框架,将时间序列大语言模型生成的见解直接注入通用推理大语言模型的推理轨迹中,从而利用领域内知识实现强大的时间序列推理。由于为知识注入微调收集数据成本高昂,我们进一步利用基于强化学习的方法与可验证奖励机制,在无需人工监督的情况下激发知识丰富的推理轨迹,然后将此类领域内思考轨迹迁移至通用推理大语言模型以实现高效知识注入。我们进一步发布了SenTSR-Bench,这是一个基于多元时间序列的诊断推理基准,收集自真实工业运营场景。在SenTSR-Bench及其他公共数据集上的实验表明,我们的方法始终优于时间序列大语言模型9.1%-26.1%,优于通用推理大语言模型7.9%-22.4%,能够提供稳健且上下文感知的时间序列诊断见解。