The Mahalanobis distance is a classical tool used to measure the covariance-adjusted distance between points in $\bbR^d$. In this work, we extend the concept of Mahalanobis distance to separable Banach spaces by reinterpreting it as a Cameron-Martin norm associated with a probability measure. This approach leads to a basis-free, data-driven notion of anomaly distance through the so-called variance norm, which can naturally be estimated using empirical measures of a sample. Our framework generalizes the classical $\bbR^d$, functional $(L^2[0,1])^d$, and kernelized settings; importantly, it incorporates non-injective covariance operators. We prove that the variance norm is invariant under invertible bounded linear transformations of the data, extending previous results which are limited to unitary operators. In the Hilbert space setting, we connect the variance norm to the RKHS of the covariance operator and establish consistency and convergence results for estimation using empirical measures. Using the variance norm, we introduce the notion of a kernelized nearest-neighbour Mahalanobis distance. In an empirical study on 12 real-world data sets, we demonstrate that the kernelized nearest-neighbour Mahalanobis distance outperforms the traditional kernelized Mahalanobis distance for multivariate time series novelty detection, using state-of-the-art time series kernels such as the signature, global alignment, and Volterra reservoir kernels.
翻译:马氏距离是用于度量 $\bbR^d$ 空间中点之间协方差调整距离的经典工具。本研究通过将马氏距离重新解释为与概率测度相关的卡梅伦-马丁范数,将其概念推广至可分巴拿赫空间。该方法通过所谓的方差范数导出一个无基的、数据驱动的异常距离概念,该范数可自然地利用样本的经验测度进行估计。我们的框架推广了经典的 $\bbR^d$ 空间、函数空间 $(L^2[0,1])^d$ 以及核化场景,并特别包含了非单射协方差算子。我们证明了方差范数在数据的可逆有界线性变换下保持不变,这扩展了先前仅限于酉算子的结果。在希尔伯特空间设定中,我们将方差范数与协方差算子的再生核希尔伯特空间相关联,并建立了基于经验测度估计的一致性及收敛性结果。利用方差范数,我们引入了核化最近邻马氏距离的概念。在12个真实世界数据集上的实证研究表明,采用最先进的时间序列核函数(如签名核、全局对齐核和Volterra储层核)时,核化最近邻马氏距离在多变量时间序列异常检测任务中优于传统的核化马氏距离。