This work considers the asymptotic behavior of the distance between two sample covariance matrices (SCM). A general result is provided for a class of functionals that can be expressed as sums of traces of functions that are separately applied to each covariance matrix. In particular, this class includes very conventional metrics, such as the Euclidean distance or Jeffrery's divergence, as well as a number of other more sophisticated distances recently derived from Riemannian geometry considerations, such as the log-Euclidean metric. In particular, we analyze the asymptotic behavior of this class of functionals by establishing a central limit theorem that allows us to describe their asymptotic statistical law. In order to account for the fact that the sample sizes of two SCMs are of the same order of magnitude as their observation dimension, results are provided by assuming that these parameters grow to infinity while their quotients converge to fixed quantities. Numerical results illustrate how this type of result can be used in order to predict the performance of these metrics in practical machine learning algorithms, such as clustering of SCMs.
翻译:本文研究了两个样本协方差矩阵(SCM)之间距离的渐近行为。针对一类可表示为分别应用于每个协方差矩阵的函数迹之和的泛函,给出了通用结论。具体而言,这类泛函包含欧氏距离或杰弗里斯散度等常规度量,以及近期从黎曼几何推导出的对数欧氏度量等更复杂的距离。通过建立中心极限定理,我们分析了这类泛函的渐近统计规律。考虑到两个SCM的样本量与其观测维度处于同一量级,本文假设这些参数趋于无穷大且其比值收敛至固定值,从而给出相应结果。数值实验展示了此类结果如何用于预测机器学习算法(如SCM聚类)中这些度量的实际性能。