Estimating a covariance matrix is central to high-dimensional data analysis. Empirical analyses of high-dimensional biomedical data, including genomics, proteomics, microbiome, and neuroimaging, among others, consistently reveal strong modularity in the dependence patterns. In these analyses, intercorrelated high-dimensional biomedical features often form communities or modules that can be interconnected with others. While the interconnected community structure has been extensively studied in biomedical research (e.g., gene co-expression networks), its potential to assist in the estimation of covariance matrices remains largely unexplored. To address this gap, we propose a procedure that leverages the commonly observed interconnected community structure in high-dimensional biomedical data to estimate large covariance and precision matrices. We derive the uniformly minimum-variance unbiased estimators for covariance and precision matrices in closed forms and provide theoretical results on their asymptotic properties. Our proposed method enhances the accuracy of covariance- and precision-matrix estimation and demonstrates superior performance compared to the competing methods in both simulations and real data analyses.
翻译:协方差矩阵估计是高维数据分析的核心。对包括基因组学、蛋白质组学、微生物组学和神经影像学在内的高维生物医学数据的实证分析,一致揭示了依赖模式中存在的强烈模块性。在这些分析中,相互关联的高维生物医学特征常常形成可以与其他模块互连的社区或模块。尽管互连社区结构在生物医学研究(例如基因共表达网络)中已得到广泛研究,但其在协助协方差矩阵估计方面的潜力在很大程度上仍未得到探索。为弥补这一空白,我们提出了一种方法,该方法利用高维生物医学数据中常见的互连社区结构来估计大型协方差矩阵和精度矩阵。我们推导出了协方差矩阵和精度矩阵的闭式一致最小方差无偏估计量,并提供了关于其渐近性质的理论结果。我们提出的方法提高了协方差矩阵和精度矩阵估计的准确性,并且在模拟和真实数据分析中均表现出优于竞争方法的性能。