We present a unified theory for Mahalanobis-type anomaly detection on Banach spaces, using ideas from Cameron-Martin theory applied to non-Gaussian measures. This approach leads to a basis-free, data-driven notion of anomaly distance through the so-called variance norm of a probability measure, which can be consistently estimated using empirical measures. Our framework generalizes the classical $\mathbb{R}^d$, functional $(L^2[0,1])^d$, and kernelized settings, including the general case of non-injective covariance operator. We prove that the variance norm depends solely on the inner product in a given Hilbert space, and hence that the kernelized Mahalanobis distance can naturally be recovered by working on reproducing kernel Hilbert spaces. Using the variance norm, we introduce the notion of a kernelized nearest-neighbour Mahalanobis distance for semi-supervised anomaly detection. In an empirical study on 12 real-world datasets, we demonstrate that the kernelized nearest-neighbour Mahalanobis distance outperforms the traditional kernelized Mahalanobis distance for multivariate time series anomaly detection, using state-of-the-art time series kernels such as the signature, global alignment, and Volterra reservoir kernels. Moreover, we provide an initial theoretical justification of nearest-neighbour Mahalanobis distances by developing concentration inequalities in the finite-dimensional Gaussian case.
翻译:我们提出了一种巴拿赫空间上马氏型异常检测的统一理论,该方法将卡梅伦-马丁理论的思想应用于非高斯测度。通过概率测度的所谓方差范数,这一方法导出了与基无关、数据驱动的异常距离概念,该概念可使用经验测度进行一致估计。我们的框架推广了经典的$\mathbb{R}^d$空间、函数空间$(L^2[0,1])^d$以及核化场景,包括非单射协方差算子的一般情形。我们证明了方差范数仅依赖于给定希尔伯特空间的内积,因此通过在处理再生核希尔伯特空间上工作,可以自然地恢复核化马氏距离。利用方差范数,我们引入了半监督异常检测的核化最近邻马氏距离概念。在12个真实世界数据集上的实证研究中,我们证明:使用最先进的时间序列核(如signature核、全局对齐核和Volterra储层核)进行多元时间序列异常检测时,核化最近邻马氏距离优于传统核化马氏距离。此外,我们通过建立有限维高斯情形下的集中不等式,为最近邻马氏距离提供了初步的理论依据。