We present a comparison between various algorithms of inference of covariance and precision matrices in small datasets of real vectors, of the typical length and dimension of human brain activity time series retrieved by functional Magnetic Resonance Imaging (fMRI). Assuming a Gaussian model underlying the neural activity, the problem consists in denoising the empirically observed matrices in order to obtain a better estimator of the true precision and covariance matrices. We consider several standard noise-cleaning algorithms and compare them on two types of datasets. The first type are time series of fMRI brain activity of human subjects at rest. The second type are synthetic time series sampled from a generative Gaussian model of which we can vary the fraction of dimensions per sample q = N/T and the strength of off-diagonal correlations. The reliability of each algorithm is assessed in terms of test-set likelihood and, in the case of synthetic data, of the distance from the true precision matrix. We observe that the so called Optimal Rotationally Invariant Estimator, based on Random Matrix Theory, leads to a significantly lower distance from the true precision matrix in synthetic data, and higher test likelihood in natural fMRI data. We propose a variant of the Optimal Rotationally Invariant Estimator in which one of its parameters is optimised by cross-validation. In the severe undersampling regime (large q) typical of fMRI series, it outperforms all the other estimators. We furthermore propose a simple algorithm based on an iterative likelihood gradient ascent, providing an accurate estimation for weakly correlated datasets.
翻译:我们比较了在真实向量的小型数据集中,协方差矩阵与精度矩阵推断的多种算法,这些数据集具有典型的人脑活动时间序列长度与维度(通过功能磁共振成像fMRI获取)。假设神经活动服从高斯模型,问题在于对经验观测矩阵进行去噪,以获得真实精度矩阵与协方差矩阵的更优估计量。我们考虑了若干标准噪声清理算法,并在两类数据集上进行比较:第一类是人类受试者静息态下的fMRI脑活动时间序列;第二类是从生成式高斯模型中采样的合成时间序列,其中可通过参数q=N/T调整每样本维度比例及非对角相关强度。各算法的可靠性通过测试集似然进行评估,对于合成数据还评估其与真实精度矩阵的距离。我们观察到,基于随机矩阵理论的最优旋转不变估计器在合成数据中显著降低了与真实精度矩阵的距离,并在天然fMRI数据中获得了更高的测试似然。我们提出该最优旋转不变估计器的一种变体,通过交叉验证优化其参数。在fMRI序列典型的大欠采样(高q值)场景下,该变体优于其他所有估计器。此外,我们提出一种基于迭代似然梯度上升的简单算法,可对弱相关数据集提供精确估计。