Time-series anomaly detection (TSAD) with multimodal large language models (MLLMs) is an emerging area, yet a persistent challenge remains: MLLMs rely on coarse time-series heuristics but struggle with multi-dimensional, detailed reasoning, which is vital for understanding complex time-series data. We present AnomSeer to address this by reinforcing the model to ground its reasoning in precise, structural details of time series, unifying anomaly classification, localization, and explanation. At its core, an expert chain-of-thought trace is generated to provide a verifiable, fine-grained reasoning from classical analyses (e.g., statistical measures, frequency transforms). Building on this, we propose a novel time-series grounded policy optimization (TimerPO) that incorporates two additional components beyond standard reinforcement learning: a time-series grounded advantage based on optimal transport and an orthogonal projection to ensure this auxiliary granular signal does not interfere with the primary detection objective. Across diverse anomaly scenarios, AnomSeer, with Qwen2.5-VL-3B/7B-Instruct, outperforms larger commercial baselines (e.g., GPT-4o) in classification and localization accuracy, particularly on point- and frequency-driven exceptions. Moreover, it produces plausible time-series reasoning traces that support its conclusions.
翻译:基于多模态大语言模型(MLLMs)的时序异常检测(TSAD)是一个新兴领域,但一个持续的挑战依然存在:MLLMs依赖于粗粒度的时序启发式方法,却难以进行多维度的、细致的推理,而这对于理解复杂的时序数据至关重要。我们提出AnomSeer来解决这一问题,通过增强模型使其推理基于时序精确的结构化细节,统一了异常分类、定位与解释。其核心是生成一个专家思维链轨迹,以提供来自经典分析(例如,统计度量、频率变换)的可验证的细粒度推理。在此基础上,我们提出了一种新颖的时序基础策略优化方法(TimerPO),它在标准强化学习之外引入了两个额外组件:一个基于最优传输的时序基础优势函数,以及一个正交投影,以确保这个辅助的细粒度信号不会干扰主要的检测目标。在多种异常场景下,基于Qwen2.5-VL-3B/7B-Instruct的AnomSeer在分类和定位准确性上超越了更大的商业基线模型(例如,GPT-4o),尤其是在点异常和频率驱动的异常上表现突出。此外,它还能生成合理的时序推理轨迹来支持其结论。