In this paper, we propose a dimensionless anomaly detection method for multivariate streams. Our method is independent of the unit of measurement for the different stream channels, therefore dimensionless. We first propose the variance norm, a generalisation of Mahalanobis distance to handle infinite-dimensional feature space and singular empirical covariance matrix rigorously. We then combine the variance norm with the path signature, an infinite collection of iterated integrals that provide global features of streams, to propose SigMahaKNN, a method for anomaly detection on (multivariate) streams. We show that SigMahaKNN is invariant to stream reparametrisation, stream concatenation and has a graded discrimination power depending on the truncation level of the path signature. We implement SigMahaKNN as an open-source software, and perform extensive numerical experiments, showing significantly improved anomaly detection on streams compared to isolation forest and local outlier factors in applications ranging from language analysis, hand-writing analysis, ship movement paths analysis and univariate time-series analysis.
翻译:本文提出了一种适用于多元流的无量纲异常检测方法。该方法独立于不同流通道的测量单位,因此具有无量纲性。我们首先提出方差范数,它是马氏距离的一种推广,能够严格处理无限维特征空间和奇异经验协方差矩阵。随后,我们将方差范数与路径签名(一种由迭代积分构成的、能够提供流全局特征的无限集合)相结合,提出了SigMahaKNN方法,用于对(多元)流进行异常检测。我们证明,SigMahaKNN对流的重参数化和连接操作具有不变性,并且其判别能力取决于路径签名的截断阶数。我们将SigMahaKNN实现为开源软件,并进行了大量数值实验。实验结果表明,在语言分析、手写分析、船舶运动路径分析以及单变量时间序列分析等应用中,该方法相比孤立森林和局部异常因子,显著提升了流数据的异常检测性能。