We study the problem of distributed estimation of the leading singular vectors for a collection of matrices with shared invariant subspaces. In particular we consider an algorithm that first estimates the projection matrices corresponding to the leading singular vectors for each individual matrix, then computes the average of the projection matrices, and finally returns the leading eigenvectors of the sample averages. We show that the algorithm, when applied to (1) parameters estimation for a collection of independent edge random graphs with shared singular vectors but possibly heterogeneous edge probabilities or (2) distributed PCA for independent sub-Gaussian random vectors with spiked covariance structure, yields estimates whose row-wise fluctuations are normally distributed around the rows of the true singular vectors. Leveraging these results we also consider a two-sample test for the null hypothesis that a pair of random graphs have the same edge probabilities and we present a test statistic whose limiting distribution converges to a central (resp. non-central) $\chi^2$ under the null (resp. local alternative) hypothesis.
翻译:我们研究了具有共享不变子空间的矩阵集合中主导奇异向量的分布式估计问题。具体而言,我们考虑一种算法:首先估计每个矩阵对应主导奇异向量的投影矩阵,然后计算这些投影矩阵的平均值,最后返回样本平均矩阵的主导特征向量。我们证明,当该算法应用于以下两种情况时:(1) 对具有共享奇异向量但边概率可能异质的独立边随机图集合进行参数估计,或(2) 对具有尖峰协方差结构的独立亚高斯随机向量进行分布式PCA,所得估计量的行波动服从以真实奇异向量行为中心的正态分布。基于这些结果,我们还考虑了一个双样本检验,用于检验一对随机图具有相同边概率的零假设,并提出了一个检验统计量,其在零假设(对应局部备择假设)下的极限分布收敛于中心化(对应非中心化)$\chi^2$分布。