Mainstream unsupervised anomaly detection algorithms often excel in academic datasets, yet their real-world performance is restricted due to the controlled experimental conditions involving clean training data. Addressing the challenge of training with noise, a prevalent issue in practical anomaly detection, is frequently overlooked. In a pioneering endeavor, this study delves into the realm of label-level noise within sensory time-series anomaly detection (TSAD). This paper presents a novel and practical end-to-end unsupervised TSAD when the training data are contaminated with anomalies. The introduced approach, called TSAD-C, is devoid of access to abnormality labels during the training phase. TSAD-C encompasses three modules: a Decontaminator to rectify the abnormalities (aka noise) present in the training data, a Long-range Variable Dependency Modeling module to capture both long-term intra- and inter-variable dependencies within the decontaminated data that can be considered as a surrogate of the pure normal data, and an Anomaly Scoring module to detect anomalies from all types. Our extensive experiments conducted on three reliable datasets conclusively demonstrate that our approach surpasses existing methodologies, thus establishing a new state-of-the-art performance in the field.
翻译:主流无监督异常检测算法在学术数据集上往往表现出色,然而由于涉及干净训练数据的受控实验条件,其实际部署性能受到限制。作为实际异常检测中的普遍问题,含噪训练数据这一挑战常被忽视。本研究作为开创性探索,首次深入探讨了传感器时间序列异常检测(TSAD)中的标签级噪声问题。本文提出了一种新颖且实用的端到端无监督TSAD方法,可在训练数据受到异常污染时有效运行。该方法名为TSAD-C,在训练阶段无需访问异常标签。TSAD-C包含三个模块:用于修正训练数据中异常(即噪声)的去污模块、针对净化后数据(可视为纯净正常数据的替代)捕捉长程变量内与变量间依赖关系的长程变量依赖建模模块,以及检测各类异常的异常评分模块。我们在三个可靠数据集上开展的广泛实验最终表明,本方法全面超越现有技术,树立了该领域新的最优性能标杆。