Principal component analysis (PCA) is a widely used technique for dimension reduction. As datasets continue to grow in size, distributed-PCA (DPCA) has become an active research area. A key challenge in DPCA lies in efficiently aggregating results across multiple machines or computing nodes due to computational overhead. Fan et al. (2019) introduced a pioneering DPCA method to estimate the leading rank-$r$ eigenspace, aggregating local rank-$r$ projection matrices by averaging. However, their method does not utilize eigenvalue information. In this article, we propose a novel DPCA method that incorporates eigenvalue information to aggregate local results via the matrix $\beta$-mean, which we call $\beta$-DPCA. The matrix $\beta$-mean offers a flexible and robust aggregation method through the adjustable choice of $\beta$ values. Notably, for $\beta=1$, it corresponds to the arithmetic mean; for $\beta=-1$, the harmonic mean; and as $\beta \to 0$, the geometric mean. Moreover, the matrix $\beta$-mean is shown to associate with the matrix $\beta$-divergence, a subclass of the Bregman matrix divergence, to support the robustness of $\beta$-DPCA. We also study the stability of eigenvector ordering under eigenvalue perturbation for $\beta$-DPCA. The performance of our proposal is evaluated through numerical studies.
翻译:主成分分析(PCA)是一种广泛使用的降维技术。随着数据集规模的持续增长,分布式主成分分析(DPCA)已成为一个活跃的研究领域。DPCA的一个关键挑战在于,由于计算开销,如何高效地在多台机器或计算节点间聚合结果。Fan等人(2019)提出了一种开创性的DPCA方法来估计前$r$秩特征子空间,通过对局部$r$秩投影矩阵进行平均来聚合结果。然而,他们的方法未利用特征值信息。在本文中,我们提出了一种新颖的DPCA方法,该方法通过矩阵$\beta$-均值(我们称之为$\beta$-DPCA)结合特征值信息来聚合局部结果。矩阵$\beta$-均值通过可调节的$\beta$值选择,提供了一种灵活且鲁棒的聚合方法。值得注意的是,当$\beta=1$时,它对应于算术平均;当$\beta=-1$时,对应于调和平均;当$\beta \to 0$时,则对应于几何平均。此外,矩阵$\beta$-均值被证明与矩阵$\beta$-散度(Bregman矩阵散度的一个子类)相关联,从而支持$\beta$-DPCA的鲁棒性。我们还研究了$\beta$-DPCA在特征值扰动下特征向量排序的稳定性。我们通过数值研究评估了所提方法的性能。