Estimating a covariance matrix is central to high-dimensional data analysis. Empirical analyses of high-dimensional biomedical data, including genomics, proteomics, microbiome, and neuroimaging, among others, consistently reveal strong modularity in the dependence patterns. In these analyses, intercorrelated high-dimensional biomedical features often form communities or modules that can be interconnected with others. While the interconnected community structure has been extensively studied in biomedical research (e.g., gene co-expression networks), its potential to assist in the estimation of covariance matrices remains largely unexplored. To address this gap, we propose a procedure that leverages the commonly observed interconnected community structure in high-dimensional biomedical data to estimate large covariance and precision matrices. We derive the uniformly minimum variance unbiased estimators for covariance and precision matrices in closed forms and provide theoretical results on their asymptotic properties. Our proposed method enhances the accuracy of covariance- and precision-matrix estimation and demonstrates superior performance compared to the competing methods in both simulations and real data analyses.
翻译:协方差矩阵估计是高维数据分析的核心问题。对基因组学、蛋白质组学、微生物组学及神经影像学等高维生物医学数据的实证分析一致表明,依赖模式中存在显著的模块化结构。在这些分析中,相互关联的高维生物医学特征通常形成可与其他社区互连的社区或模块。尽管互连社区结构已在生物医学研究(如基因共表达网络)中得到广泛研究,但其在协方差矩阵估计中的潜在应用价值仍鲜有探索。为填补这一空白,我们提出一种利用高维生物医学数据中普遍存在的互连社区结构来估计大型协方差矩阵和精度矩阵的方法。我们推导出协方差矩阵和精度矩阵的闭式一致最小方差无偏估计量,并给出其渐近性质的理论结果。所提方法提升了协方差矩阵和精度矩阵的估计精度,并在仿真实验和真实数据分析中均展现出优于对比方法的性能。