Are we certain it's anomalous?

The progress in modelling time series and, more generally, sequences of structured data has recently revamped research in anomaly detection. The task stands for identifying abnormal behaviors in financial series, IT systems, aerospace measurements, and the medical domain, where anomaly detection may aid in isolating cases of depression and attend the elderly. Anomaly detection in time series is a complex task since anomalies are rare due to highly non-linear temporal correlations and since the definition of anomalous is sometimes subjective. Here we propose the novel use of Hyperbolic uncertainty for Anomaly Detection (HypAD). HypAD learns self-supervisedly to reconstruct the input signal. We adopt best practices from the state-of-the-art to encode the sequence by an LSTM, jointly learned with a decoder to reconstruct the signal, with the aid of GAN critics. Uncertainty is estimated end-to-end by means of a hyperbolic neural network. By using uncertainty, HypAD may assess whether it is certain about the input signal but it fails to reconstruct it because this is anomalous; or whether the reconstruction error does not necessarily imply anomaly, as the model is uncertain, e.g. a complex but regular input signal. The novel key idea is that a \emph{detectable anomaly} is one where the model is certain but it predicts wrongly. HypAD outperforms the current state-of-the-art for univariate anomaly detection on established benchmarks based on data from NASA, Yahoo, Numenta, Amazon, and Twitter. It also yields state-of-the-art performance on a multivariate dataset of anomaly activities in elderly home residences, and it outperforms the baseline on SWaT. Overall, HypAD yields the lowest false alarms at the best performance rate, thanks to successfully identifying detectable anomalies.

翻译：时间序列建模以及更一般意义上的结构化数据序列建模的进展，近来重新激发了异常检测领域的研究。该任务旨在识别金融序列、IT系统、航天测量和医疗领域中的异常行为，在医疗领域中，异常检测可能有助于隔离抑郁症病例并照护老年人。时间序列中的异常检测是一项复杂的任务，因为异常现象罕见，其背后存在高度非线性的时间相关性，且“异常”的定义有时带有主观性。本文提出了一种新颖的基于双曲不确定性的异常检测方法（HypAD）。HypAD以自监督方式学习重建输入信号。我们采用来自当前最优方法的最佳实践，通过LSTM对序列进行编码，并与解码器联合学习以重建信号，同时借助GAN评判器。不确定性通过双曲神经网络以端到端的方式估计。通过利用不确定性，HypAD可以评估：模型是否对输入信号确信但却因信号异常而无法重建；或者重建误差是否不一定意味着异常（因为模型存在不确定性），例如处理复杂但正常的输入信号时。其新颖的关键思想是：一个“可检测的异常”指的是模型确信但预测错误的情况。基于来自NASA、雅虎、Numenta、亚马逊和Twitter的数据集，在已建立的基准测试上，HypAD在单变量异常检测方面超越了当前最优方法。同时，它在老年人住宅异常活动多变量数据集上取得了最优性能，并在SWaT数据集上优于基线方法。总体而言，HypAD通过成功识别可检测的异常，以最佳性能率实现了最低的误报率。

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

【干货书】工程和科学中的概率和统计，

专知会员服务

58+阅读 · 2022年12月24日

我们真的需要深度学习模型来预测时间序列吗? Do We Really Need Deep Learning Models for Time Series Forecasting?

专知会员服务

37+阅读 · 2022年3月13日

【CVPR 2022】单黑箱和多黑箱预测的领域适应，DINE: Domain Adaptation from Single and Multiple Black-box Predictors

专知会员服务

14+阅读 · 2022年3月12日

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日