Can Multimodal LLMs Perform Time Series Anomaly Detection?

Time series anomaly detection (TSAD) has been a long-standing pillar problem in Web-scale systems and online infrastructures, such as service reliability monitoring, system fault diagnosis, and performance optimization. Large language models (LLMs) have demonstrated unprecedented capabilities in time series analysis, the potential of multimodal LLMs (MLLMs), particularly vision-language models, in TSAD remains largely under-explored. One natural way for humans to detect time series anomalies is through visualization and textual description. It motivates our research question: Can multimodal LLMs perform time series anomaly detection? Existing studies often oversimplify the problem by treating point-wise anomalies as special cases of range-wise ones or by aggregating point anomalies to approximate range-wise scenarios. They limit our understanding for realistic scenarios such as multi-granular anomalies and irregular time series. To address the gap, we build a VisualTimeAnomaly benchmark to comprehensively investigate zero-shot capabilities of MLLMs for TSAD, progressively from point-, range-, to variate-wise anomalies, and extends to irregular sampling conditions. Our study reveals several key insights in multimodal MLLMs for TSAD. Built on these findings, we propose a MLLMs-based multi-agent framework TSAD-Agents to achieve automatic TSAD. Our framework comprises scanning, planning, detection, and checking agents that synergistically collaborate to reason, plan, and self-reflect to enable automatic TSAD. These agents adaptively invoke tools such as traditional methods and MLLMs and dynamically switch between text and image modalities to optimize detection performance.

翻译：时间序列异常检测（TSAD）一直是网络规模系统和在线基础设施（如服务可靠性监控、系统故障诊断和性能优化）中长期存在的支柱性问题。尽管大语言模型（LLM）在时间序列分析中展现出前所未有的能力，但多模态大语言模型（MLLM），特别是视觉语言模型在TSAD中的潜力仍未得到充分探索。人类检测时间序列异常的一种自然方式是通过可视化和文本描述。这启发了我们的研究问题：多模态大语言模型能否执行时间序列异常检测？现有研究常通过将点状异常视为范围状异常的特例，或通过聚合点异常来近似范围状场景，从而过度简化问题。这限制了我们对于多粒度异常和不规则时间序列等现实场景的理解。为弥补这一差距，我们构建了VisualTimeAnomaly基准，以全面探究MLLM在TSAD中的零样本能力，从点状、范围状到变量级异常逐步推进，并扩展到不规则采样条件。我们的研究揭示了MLLM在TSAD中的若干关键发现。基于这些发现，我们提出了一个基于MLLM的多智能体框架TSAD-Agents以实现自动化TSAD。该框架包含扫描、规划、检测和校验智能体，它们协同合作进行推理、规划和自我反思，从而实现自动化TSAD。这些智能体自适应地调用传统方法和MLLM等工具，并在文本与图像模态间动态切换以优化检测性能。