In this work we consider the problem of estimating the principal subspace (span of the top r singular vectors) of a symmetric matrix in a federated setting, when each node has access to estimates of this matrix. We study how to make this problem Byzantine resilient. We introduce a novel provably Byzantine-resilient, communication-efficient, and private algorithm, called Subspace-Median, to solve it. We also study the most natural solution for this problem, a geometric median based modification of the federated power method, and explain why it is not useful. We consider two special cases of the resilient subspace estimation meta-problem - federated principal components analysis (PCA) and the spectral initialization step of horizontally federated low rank column-wise sensing (LRCCS) in this work. For both these problems we show how Subspace Median provides a resilient solution that is also communication-efficient. Median of Means extensions are developed for both problems. Extensive simulation experiments are used to corroborate our theoretical guarantees. Our second contribution is a complete AltGDmin based algorithm for Byzantine-resilient horizontally federated LRCCS and guarantees for it. We do this by developing a geometric median of means estimator for aggregating the partial gradients computed at each node, and using Subspace Median for initialization.
翻译:本文研究在联邦场景下估计对称矩阵主成分子空间(前r个奇异向量张成的空间)的问题,此时每个节点均可访问该矩阵的估计值。我们探讨如何使该问题具备拜占庭容错性。为此提出一种新颖的、可证明具有拜占庭容错性、通信高效且保护隐私的算法——子空间中位数法(Subspace-Median)。同时分析该问题最自然的解法——基于几何中位数的联邦幂法改进版,并阐释其失效原因。本文重点关注容错子空间估计元问题的两个特例:联邦主成分分析(PCA)与水平联邦低秩列向感知(LRCCS)的谱初始化步骤。针对这两个问题,我们证明子空间中位数法能提供兼具容错性与通信高效性的解决方案。为两种问题分别开发了均值中位数(Median of Means)扩展方法,并通过大量仿真实验验证理论保证。第二项贡献是提出基于AltGDmin的完整算法,用于实现拜占庭容错的水平联邦LRCCS,并给出相应的理论保证。该算法通过开发几何均值中位数估计器来聚合各节点计算的局部梯度,并采用子空间中位数法进行初始化。