The progress in modelling time series and, more generally, sequences of structured data has recently revamped research in anomaly detection. The task stands for identifying abnormal behaviors in financial series, IT systems, aerospace measurements, and the medical domain, where anomaly detection may aid in isolating cases of depression and attend the elderly. Anomaly detection in time series is a complex task since anomalies are rare due to highly non-linear temporal correlations and since the definition of anomalous is sometimes subjective. Here we propose the novel use of Hyperbolic uncertainty for Anomaly Detection (HypAD). HypAD learns self-supervisedly to reconstruct the input signal. We adopt best practices from the state-of-the-art to encode the sequence by an LSTM, jointly learned with a decoder to reconstruct the signal, with the aid of GAN critics. Uncertainty is estimated end-to-end by means of a hyperbolic neural network. By using uncertainty, HypAD may assess whether it is certain about the input signal but it fails to reconstruct it because this is anomalous; or whether the reconstruction error does not necessarily imply anomaly, as the model is uncertain, e.g. a complex but regular input signal. The novel key idea is that a \emph{detectable anomaly} is one where the model is certain but it predicts wrongly. HypAD outperforms the current state-of-the-art for univariate anomaly detection on established benchmarks based on data from NASA, Yahoo, Numenta, Amazon, and Twitter. It also yields state-of-the-art performance on a multivariate dataset of anomaly activities in elderly home residences, and it outperforms the baseline on SWaT. Overall, HypAD yields the lowest false alarms at the best performance rate, thanks to successfully identifying detectable anomalies.
翻译:时间序列建模以及更一般意义上的结构化数据序列建模的进展,近来重新激发了异常检测领域的研究。该任务旨在识别金融序列、IT系统、航天测量和医疗领域中的异常行为,在医疗领域中,异常检测可能有助于隔离抑郁症病例并照护老年人。时间序列中的异常检测是一项复杂的任务,因为异常现象罕见,其背后存在高度非线性的时间相关性,且“异常”的定义有时带有主观性。本文提出了一种新颖的基于双曲不确定性的异常检测方法(HypAD)。HypAD以自监督方式学习重建输入信号。我们采用来自当前最优方法的最佳实践,通过LSTM对序列进行编码,并与解码器联合学习以重建信号,同时借助GAN评判器。不确定性通过双曲神经网络以端到端的方式估计。通过利用不确定性,HypAD可以评估:模型是否对输入信号确信但却因信号异常而无法重建;或者重建误差是否不一定意味着异常(因为模型存在不确定性),例如处理复杂但正常的输入信号时。其新颖的关键思想是:一个“可检测的异常”指的是模型确信但预测错误的情况。基于来自NASA、雅虎、Numenta、亚马逊和Twitter的数据集,在已建立的基准测试上,HypAD在单变量异常检测方面超越了当前最优方法。同时,它在老年人住宅异常活动多变量数据集上取得了最优性能,并在SWaT数据集上优于基线方法。总体而言,HypAD通过成功识别可检测的异常,以最佳性能率实现了最低的误报率。