Recent progress at the intersection of large language models (LLMs) and time series (TS) analysis has revealed both promise and fragility. While LLMs can reason over temporal structure given carefully engineered context, they often struggle with numeric fidelity, modality interference, and principled cross-modal integration. We present TS-Debate, a modality-specialized, collaborative multi-agent debate framework for zero-shot time series reasoning. TS-Debate assigns dedicated expert agents to textual context, visual patterns, and numerical signals, preceded by explicit domain knowledge elicitation, and coordinates their interaction via a structured debate protocol. Reviewer agents evaluate agent claims using a verification-conflict-calibration mechanism, supported by lightweight code execution and numerical lookup for programmatic verification. This architecture preserves modality fidelity, exposes conflicting evidence, and mitigates numeric hallucinations without task-specific fine-tuning. Across 20 tasks spanning three public benchmarks, TS-Debate achieves consistent and significant performance improvements over strong baselines, including standard multimodal debate in which all agents observe all inputs.
翻译:近期,大型语言模型(LLM)与时间序列(TS)分析交叉领域的研究进展既展现了前景,也暴露了其脆弱性。尽管LLM在给定精心设计的上下文时能够对时序结构进行推理,但它们常常在数值保真度、模态干扰以及原则性的跨模态整合方面存在困难。我们提出了TS-Debate,一个用于零样本时间序列推理的、模态专业化、协作式的多智能体辩论框架。TS-Debate为文本上下文、视觉模式和数值信号分配了专门的专家智能体,并在辩论前进行显式的领域知识启发,通过结构化的辩论协议来协调它们之间的交互。评审智能体利用一个验证-冲突-校准机制来评估各智能体的主张,该机制得到轻量级代码执行和数值查找的支持,以进行程序化验证。此架构保持了模态保真度,揭示了相互冲突的证据,并在无需任务特定微调的情况下缓解了数值幻觉问题。在涵盖三个公开基准测试的20项任务中,TS-Debate相较于包括所有智能体均观察所有输入的标准多模态辩论在内的强基线模型,均取得了持续且显著的性能提升。