Federated learning (FL) enables collaborative training of machine learning models without sharing training data. Traditional FL heavily relies on a trusted centralized server. Although decentralized FL eliminates the central dependence, it may worsen the other inherit problems faced by FL such as poisoning attacks and data representation leakage due to insufficient restrictions on the behavior of participants, and heavy communication cost, especially in fully decentralized scenarios, i.e., peer-to-peer (P2P) settings. In this paper, we propose a blockchain-based fully decentralized P2P framework for FL, called BlockDFL. It takes blockchain as the foundation, leveraging the proposed PBFT-based voting mechanism and two-layer scoring mechanism to coordinate FL among peer participants without mutual trust, while effectively defending against poisoning attacks. Gradient compression is introduced to lowering communication cost and prevent data from being reconstructed from transmitted model updates. Extensive experiments conducted on two real-world datasets exhibit that BlockDFL obtains competitive accuracy compared to centralized FL and can defend poisoning attacks while achieving efficiency and scalability. Especially when the proportion of malicious participants is as high as 40%, BlockDFL can still preserve the accuracy of FL, outperforming existing fully decentralized P2P FL frameworks based on blockchain.
翻译:联邦学习(FL)能够在无需共享训练数据的情况下协作训练机器学习模型。传统FL高度依赖可信的集中式服务器。尽管去中心化FL消除了对中央节点的依赖,但由于对参与者行为缺乏足够约束,可能会加剧FL固有的其他问题,例如投毒攻击和数据表征泄露,尤其在完全去中心化场景(即点对点(P2P)设置)中,通信开销也更为严重。本文提出了一种基于区块链的完全去中心化P2P联邦学习框架BlockDFL。该框架以区块链为基础,利用基于PBFT的投票机制和双层评分机制,在无需相互信任的参与者之间协调FL,同时有效防御投毒攻击。引入梯度压缩以降低通信开销,并防止从传输的模型更新中重建数据。在两个真实数据集上进行的大量实验表明,BlockDFL能够获得与集中式FL相当精度,有效防御投毒攻击,同时实现高效性和可扩展性。特别是当恶意参与者比例高达40%时,BlockDFL仍能保持FL的精度,优于现有基于区块链的完全去中心化P2P FL框架。