Obtaining an accurate estimate of the underlying covariance matrix from finite sample size data is challenging due to sample size noise. In recent years, sophisticated covariance-cleaning techniques based on random matrix theory have been proposed to address this issue. Most of these methods aim to achieve an optimal covariance matrix estimator by minimizing the Frobenius norm distance as a measure of the discrepancy between the true covariance matrix and the estimator. However, this practice offers limited interpretability in terms of information theory. To better understand this relationship, we focus on the Kullback-Leibler divergence to quantify the information lost by the estimator. Our analysis centers on rotationally invariant estimators, which are state-of-art in random matrix theory, and we derive an analytical expression for their Kullback-Leibler divergence. Due to the intricate nature of the calculations, we use genetic programming regressors paired with human intuition. Ultimately, using this approach, we formulate a conjecture validated through extensive simulations, showing that the Frobenius distance corresponds to a first-order expansion term of the Kullback-Leibler divergence, thus establishing a more defined link between the two measures.
翻译:从有限样本数据中获取底层协方差矩阵的精确估计具有挑战性,这主要源于样本噪声。近年来,基于随机矩阵理论的复杂协方差清洗技术被提出以解决此问题。大多数方法旨在通过最小化Frobenius范数距离(作为真实协方差矩阵与估计量之间差异的度量)来实现最优协方差矩阵估计。然而,这种做法在信息论层面的可解释性有限。为了更好地理解这一关系,我们聚焦于Kullback-Leibler散度来量化估计量造成的信息损失。我们的分析以旋转不变估计量(随机矩阵理论中的前沿方法)为核心,推导了其Kullback-Leibler散度的解析表达式。由于计算的复杂性,我们采用遗传编程回归器结合人类直觉进行处理。最终,通过该方法我们提出了一个经过大量模拟验证的猜想,表明Frobenius距离对应于Kullback-Leibler散度的一阶展开项,从而在这两种度量之间建立了更明确的联系。