Time series anomaly detection (TSAD) has traditionally focused on binary classification and often lacks the fine-grained categorization and explanatory reasoning required for transparent decision-making. To address these limitations, we propose Time-series Reasoning for Anomaly (Time-RA), a novel task that reformulates TSAD from a discriminative into a generative, reasoning-intensive paradigm. To facilitate this, we introduce RATs40K, the first real-world large-scale multimodal benchmark with ~40,000 samples across 10 domains, integrating raw time series, textual context, and visual plots with structured reasoning annotations. Extensive benchmarking shows that while supervised fine-tuning and visual representations boost diagnostic accuracy and reasoning consistency, performance varies across complex scenarios. Notably, fine-tuned models demonstrate strong "plug-and-play" transferability, outperforming traditional baselines on unseen real-world datasets. Our work establishes a foundation for interpretable, multimodal time series analysis. All code (https://github.com/yyysjz1997/Time-RA) and the RATs40K dataset (https://huggingface.co/datasets/Time-RA/RATs40K) are fully open-sourced to facilitate future research.
翻译:摘要:传统时序异常检测任务主要关注二元分类,往往缺乏透明决策所需的细粒度分类与解释性推理能力。为解决这些局限,我们提出时序异常推理任务(Time-RA),该创新任务将时序异常检测从判别式范式重构为生成式、强推理型范式。为此,我们构建了首个大规模跨模态真实世界基准数据集RATs40K,包含覆盖10个领域的约4万个样本,融合原始时序数据、文本上下文、可视化图表及结构化推理标注。广泛基准测试表明,尽管监督微调和可视化表征能提升诊断准确性与推理一致性,但不同复杂场景下表现存在差异。值得注意的是,微调模型展现出强大的"即插即用"迁移能力,在未见真实世界数据集上优于传统基线。本研究为可解释的跨模态时序分析奠定基础。全部代码(https://github.com/yyysjz1997/Time-RA)及RATs40K数据集(https://huggingface.co/datasets/Time-RA/RATs40K)已完全开源,以促进后续研究。