LLM-driven Anomaly Detection (AD) helps enhance the understanding and explanatory abilities of anomalous behaviors in Time Series (TS). Existing methods face challenges of inadequate reasoning ability, deficient multi-turn dialogue capability, and narrow generalization. To this end, we 1) propose a multi-agent-based TS Evolution algorithm named TSEvol. On top of it, we 2) introduce the AD reasoning and multi-turn dialogue Dataset TSEData-20K and contribute the Chatbot family for AD, including ChatAD-Llama3-8B, Qwen2.5-7B, and Mistral-7B. Furthermore, 3) we propose the TS Kahneman-Tversky Optimization (TKTO) to enhance ChatAD's cross-task generalization capability. Lastly, 4) we propose a LLM-driven Learning-based AD Benchmark LLADBench to evaluate the performance of ChatAD and nine baselines across seven datasets and tasks. Our three ChatAD models achieve substantial gains, up to 34.50% in accuracy, 34.71% in F1, and a 37.42% reduction in false positives. Besides, via KTKO, our optimized ChatAD achieves competitive performance in reasoning and cross-task generalization on classification, forecasting, and imputation.
翻译:LLM驱动的异常检测有助于增强对时间序列中异常行为的理解与解释能力。现有方法面临推理能力不足、多轮对话能力欠缺以及泛化范围狭窄等挑战。为此,我们:1)提出一种基于多智能体的时间序列演进算法TSEvol;在此基础上,2)构建了包含异常检测推理与多轮对话的数据集TSEData-20K,并发布了ChatAD系列模型,包括ChatAD-Llama3-8B、Qwen2.5-7B与Mistral-7B;进一步地,3)提出TS Kahneman-Tversky优化方法以增强ChatAD的跨任务泛化能力;最后,4)建立了LLM驱动的基于学习的异常检测基准LLADBench,用于在七个数据集与任务上评估ChatAD及九种基线方法的性能。我们的三个ChatAD模型取得了显著提升,准确率最高提升34.50%,F1分数最高提升34.71%,误报率降低37.42%。此外,通过KTKO优化后的ChatAD在分类、预测与填补任务上,其推理能力与跨任务泛化性能均达到竞争优势。