Detecting anomalies in temporal data has gained significant attention across various real-world applications, aiming to identify unusual events and mitigate potential hazards. In practice, situations often involve a mix of segment-level labels (detected abnormal events with segments of time points) and unlabeled data (undetected events), while the ideal algorithmic outcome should be point-level predictions. Therefore, the huge label information gap between training data and targets makes the task challenging. In this study, we formulate the above imperfect information as noisy labels and propose NRdetector, a noise-resilient framework that incorporates confidence-based sample selection, robust segment-level learning, and data-centric point-level detection for multivariate time series anomaly detection. Particularly, to bridge the information gap between noisy segment-level labels and missing point-level labels, we develop a novel loss function that can effectively mitigate the label noise and consider the temporal features. It encourages the smoothness of consecutive points and the separability of points from segments with different labels. Extensive experiments on real-world multivariate time series datasets with 11 different evaluation metrics demonstrate that NRdetector consistently achieves robust results across multiple real-world datasets, outperforming various baselines adapted to operate in our setting.
翻译:时间数据中的异常检测在各类现实应用中受到广泛关注,旨在识别异常事件并减轻潜在危害。实践中,情况通常涉及片段级标签(已检测到的包含时间点片段的异常事件)与未标记数据(未检测到的事件)的混合,而理想的算法输出应为点级预测。因此,训练数据与目标之间存在巨大的标签信息鸿沟,使得该任务极具挑战性。在本研究中,我们将上述不完美信息形式化为噪声标签,并提出NRdetector——一个噪声鲁棒的框架,该框架结合了基于置信度的样本选择、鲁棒的片段级学习以及以数据为中心的点级检测,用于多元时间序列异常检测。特别地,为弥合噪声片段级标签与缺失点级标签之间的信息差距,我们开发了一种新颖的损失函数,该函数能有效缓解标签噪声并考虑时序特征。它鼓励连续点的平滑性以及来自不同标签片段的点之间的可分离性。在真实世界多元时间序列数据集上使用11种不同评估指标进行的广泛实验表明,NRdetector在多个真实数据集上始终取得鲁棒的结果,优于为适应我们设定而调整的各种基线方法。