Distributed Federated Learning (DFL) enables decentralized model training across large-scale systems without a central parameter server. However, DFL faces three critical challenges: privacy leakage from honest-but-curious neighbors, slow convergence due to the lack of central coordination, and vulnerability to Byzantine adversaries aiming to degrade model accuracy. To address these issues, we propose a novel DFL framework that integrates Byzantine robustness, privacy preservation, and convergence acceleration. Within this framework, each device trains a local model using a Bayesian approach and independently selects an optimal subset of neighbors for posterior exchange. We formulate this neighbor selection as an optimization problem to minimize the global loss function under security and privacy constraints. Solving this problem is challenging because devices only possess partial network information, and the complex coupling between topology, security, and convergence remains unclear. To bridge this gap, we first analytically characterize the trade-offs between dynamic connectivity, Byzantine detection, privacy levels, and convergence speed. Leveraging these insights, we develop a fully distributed Graph Neural Network (GNN)-based Reinforcement Learning (RL) algorithm. This approach enables devices to make autonomous connection decisions based on local observations. Simulation results demonstrate that our method achieves superior robustness and efficiency with significantly lower overhead compared to traditional security and privacy schemes.
翻译:分布式联邦学习(DFL)支持在大规模系统中进行去中心化模型训练,无需中央参数服务器。然而,DFL面临三个关键挑战:诚实但好奇的邻居导致的隐私泄露、缺乏中央协调导致的收敛缓慢,以及旨在降低模型准确性的拜占庭攻击者的脆弱性。为解决这些问题,我们提出了一种新颖的DFL框架,该框架集成了拜占庭鲁棒性、隐私保护和收敛加速。在此框架内,每个设备使用贝叶斯方法训练本地模型,并独立选择最优的邻居子集进行后验交换。我们将此邻居选择问题表述为一个优化问题,旨在安全与隐私约束下最小化全局损失函数。解决此问题具有挑战性,因为设备仅拥有部分网络信息,且拓扑结构、安全性与收敛性之间的复杂耦合关系尚不明确。为弥合这一差距,我们首先从理论上分析了动态连接性、拜占庭检测、隐私级别与收敛速度之间的权衡关系。基于这些见解,我们开发了一种完全分布式的基于图神经网络(GNN)的强化学习(RL)算法。该方法使设备能够基于本地观测自主做出连接决策。仿真结果表明,与传统安全和隐私方案相比,我们的方法以显著更低的开销实现了更优的鲁棒性和效率。