Both Byzantine resilience and communication efficiency have attracted tremendous attention recently for their significance in edge federated learning. However, most existing algorithms may fail when dealing with real-world irregular data that behaves in a heavy-tailed manner. To address this issue, we study the stochastic convex and non-convex optimization problem for federated learning at edge and show how to handle heavy-tailed data while retaining the Byzantine resilience, communication efficiency and the optimal statistical error rates simultaneously. Specifically, we first present a Byzantine-resilient distributed gradient descent algorithm that can handle the heavy-tailed data and meanwhile converge under the standard assumptions. To reduce the communication overhead, we further propose another algorithm that incorporates gradient compression techniques to save communication costs during the learning process. Theoretical analysis shows that our algorithms achieve order-optimal statistical error rate in presence of Byzantine devices. Finally, we conduct extensive experiments on both synthetic and real-world datasets to verify the efficacy of our algorithms.
翻译:拜占庭鲁棒性与通信效率最近因其在边端联邦学习中的重要性而受到广泛关注。然而,现有的大多数算法在处理具有重尾特性的真实不规则数据时可能会失效。为解决这一问题,我们研究了边端联邦学习中的随机凸优化与非凸优化问题,并展示了如何在保持拜占庭鲁棒性、通信效率及最优统计误差率的同时处理重尾数据。具体而言,我们首先提出了一种能够处理重尾数据且在标准假设下收敛的拜占庭鲁棒分布式梯度下降算法。为降低通信开销,我们进一步提出了另一种算法,该算法在训练过程中结合梯度压缩技术以节省通信成本。理论分析表明,在存在拜占庭设备的情况下,我们的算法达到了阶数最优的统计误差率。最后,我们在合成数据集和真实数据集上进行了大量实验,验证了所提算法的有效性。