ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection

Anomaly detection in multivariate time series data is of paramount importance for ensuring the efficient operation of large-scale systems across diverse domains. However, accurately detecting anomalies in such data poses significant challenges. Existing approaches, including forecasting and reconstruction-based methods, struggle to address these challenges effectively. To overcome these limitations, we propose a novel anomaly detection framework named ImDiffusion, which combines time series imputation and diffusion models to achieve accurate and robust anomaly detection. The imputation-based approach employed by ImDiffusion leverages the information from neighboring values in the time series, enabling precise modeling of temporal and inter-correlated dependencies, reducing uncertainty in the data, thereby enhancing the robustness of the anomaly detection process. ImDiffusion further leverages diffusion models as time series imputers to accurately capturing complex dependencies. We leverage the step-by-step denoised outputs generated during the inference process to serve as valuable signals for anomaly prediction, resulting in improved accuracy and robustness of the detection process. We evaluate the performance of ImDiffusion via extensive experiments on benchmark datasets. The results demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in terms of detection accuracy and timeliness. ImDiffusion is further integrated into the real production system in Microsoft and observe a remarkable 11.4% increase in detection F1 score compared to the legacy approach. To the best of our knowledge, ImDiffusion represents a pioneering approach that combines imputation-based techniques with time series anomaly detection, while introducing the novel use of diffusion models to the field.

翻译：多变量时间序列数据的异常检测对于保障各领域大规模系统的高效运行至关重要。然而，在这类数据中准确检测异常面临重大挑战。现有方法（包括基于预测和重构的方法）难以有效应对这些挑战。为突破这些局限，我们提出了一种名为ImDiffusion的新型异常检测框架，该框架结合时间序列插补与扩散模型，实现了精准且鲁棒的异常检测。ImDiffusion采用的基于插补的方法充分利用时间序列中相邻值的信息，能够精确建模时间依赖与相互关联性，降低数据不确定性，从而增强异常检测过程的鲁棒性。ImDiffusion进一步利用扩散模型作为时间序列插补器，以准确捕捉复杂依赖关系。我们利用推理过程中逐步生成的去噪输出作为异常预测的有效信号，从而提升检测过程的准确性与鲁棒性。通过在基准数据集上开展的大量实验，我们评估了ImDiffusion的性能。结果表明，该框架在检测精度和时效性上显著优于现有最先进方法。ImDiffusion已进一步集成至微软实际生产系统，与原有方法相比，检测F1分数实现了11.4%的显著提升。据我们所知，ImDiffusion是首个将基于插补的技术与时间序列异常检测相结合，并将扩散模型创新引入该领域的先驱性方法。

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日