Estimating a covariance matrix is central to high-dimensional data analysis. The proposed method is motivated by the dependence pattern analyses of multiple types of high-dimensional biomedical data including but not limited to genomics, proteomics, microbiome, and neuroimaging data. The correlation matrices of these biomedical data all demonstrate a well-organized block pattern. In this pattern, the positive and negative pair-wise correlations with large absolute values, are mainly concentrated within diagonal and off-diagonal blocks. We develop a covariance- and precision-matrix estimation framework to fully leverage the organized block pattern. We propose new best unbiased covariance- and precision-matrix estimators in closed forms, and develop theories for the asymptotic proprieties of estimators in both scenarios where the number of blocks is less or greater than the sample size. The simulation and data example analyses show that our method is robust and improves the accuracy of covariance- and precision-matrix estimation.
翻译:摘要:协方差矩阵估计在高维数据分析中至关重要。本文提出的方法受多种类型高维生物医学数据(包括但不限于基因组学、蛋白质组学、微生物组学和神经影像学数据)依赖模式分析的启发。这些生物医学数据的相关矩阵均呈现出高度有序的分块模式:绝对值较大的正负成对相关性主要集中在对角线及非对角线分块内。我们开发了一个协方差矩阵与精度矩阵估计框架,以充分利用这种有序的分块模式。我们提出了封闭形式的无偏协方差与精度矩阵最优估计量,并建立了在分块数量小于或大于样本量两种情形下估计量渐近性质的理论。模拟实验与数据实例分析表明,我们的方法具有鲁棒性,能提升协方差与精度矩阵估计的准确性。