The assumption of independent subvectors arises in many aspects of multivariate analysis. In most real-world applications, however, we lack prior knowledge about the number of subvectors and the specific variables within each subvector. Yet, testing all these combinations is not feasible. For example, for a data matrix containing 15 variables, there are already 1 382 958 545 possible combinations. Given that zero correlation is a necessary condition for independence, independent subvectors exhibit a block diagonal covariance matrix. This paper focuses on the detection of such block diagonal covariance structures in high-dimensional data and therefore also identifies uncorrelated subvectors. Our nonparametric approach exploits the fact that the structure of the covariance matrix is mirrored by the structure of its eigenvectors. However, the true block diagonal structure is masked by noise in the sample case. To address this problem, we propose to use sparse approximations of the sample eigenvectors to reveal the sparse structure of the population eigenvectors. Notably, the right singular vectors of a data matrix with an overall mean of zero are identical to the sample eigenvectors of its covariance matrix. Using sparse approximations of these singular vectors instead of the eigenvectors makes the estimation of the covariance matrix obsolete. We demonstrate the performance of our method through simulations and provide real data examples. Supplementary materials for this article are available online.
翻译:独立子向量假设在多变量分析中具有广泛应用。然而在多数实际场景中,我们缺乏关于子向量数量及其具体变量构成的先验知识。逐一检验所有组合是不可行的:以包含15个变量的数据矩阵为例,其可能组合数已达1 382 958 545种。鉴于零相关系数是独立性的必要条件,独立子向量对应的协方差矩阵应呈现块对角结构。本文聚焦于高维数据中这类块对角协方差结构的检测问题,进而识别不相关子向量。我们的非参数方法利用协方差矩阵结构与其特征向量结构的镜像关系。但样本情形下真实块对角结构会被噪声掩盖。为解决该问题,我们提出采用样本特征向量的稀疏近似来揭示总体特征向量的稀疏结构。值得注意的是,总体均值为零的数据矩阵的右奇异向量与其协方差矩阵的样本特征向量完全等价。利用这些奇异向量(而非特征向量)的稀疏近似,可避免协方差矩阵的估计步骤。我们通过数值模拟验证方法性能,并提供真实数据示例。本文补充材料可在线获取。