This work proposes a dual-functional blockchain framework named BagChain for bagging-based decentralized learning. BagChain integrates blockchain with distributed machine learning by replacing the computationally costly hash operations in proof-of-work with machine-learning model training. BagChain utilizes individual miners' private data samples and limited computing resources to train potentially weak base models, which may be very weak, and further aggregates them into strong ensemble models. Specifically, we design a three-layer blockchain structure associated with the corresponding generation and validation mechanisms to enable distributed machine learning among uncoordinated miners in a permissionless and open setting. To reduce computational waste due to blockchain forking, we further propose the cross fork sharing mechanism for practical networks with lengthy delays. Extensive experiments illustrate the superiority and efficacy of BagChain when handling various machine learning tasks on both independently and identically distributed (IID) and non-IID datasets. BagChain remains robust and effective even when facing constrained local computing capability, heterogeneous private user data, and sparse network connectivity.
翻译:本文提出了一种名为BagChain的双功能区块链框架,用于实现基于Bagging的去中心化学习。BagChain通过将工作量证明中计算成本高昂的哈希操作替换为机器学习模型训练,将区块链与分布式机器学习相结合。BagChain利用个体矿工的私有数据样本和有限的计算资源来训练可能较弱的基模型(这些模型可能非常弱),并进一步将其聚合为强大的集成模型。具体而言,我们设计了一个三层区块链结构,并配以相应的生成与验证机制,以在无需许可的开放环境中实现无协调矿工间的分布式机器学习。为减少因区块链分叉导致的计算资源浪费,我们进一步提出了适用于具有长延迟的实际网络的跨分叉共享机制。大量实验表明,BagChain在处理独立同分布(IID)和非独立同分布(non-IID)数据集上的各类机器学习任务时均表现出优越性和高效性。即使在面临受限的本地计算能力、异构的私有用户数据以及稀疏的网络连接时,BagChain仍能保持鲁棒性和有效性。