Stochastic Coded Federated Learning: Theoretical Analysis and Incentive Mechanism Design

Federated learning (FL) has achieved great success as a privacy-preserving distributed training paradigm, where many edge devices collaboratively train a machine learning model by sharing the model updates instead of the raw data with a server. However, the heterogeneous computational and communication resources of edge devices give rise to stragglers that significantly decelerate the training process. To mitigate this issue, we propose a novel FL framework named stochastic coded federated learning (SCFL) that leverages coded computing techniques. In SCFL, before the training process starts, each edge device uploads a privacy-preserving coded dataset to the server, which is generated by adding Gaussian noise to the projected local dataset. During training, the server computes gradients on the global coded dataset to compensate for the missing model updates of the straggling devices. We design a gradient aggregation scheme to ensure that the aggregated model update is an unbiased estimate of the desired global update. Moreover, this aggregation scheme enables periodical model averaging to improve the training efficiency. We characterize the tradeoff between the convergence performance and privacy guarantee of SCFL. In particular, a more noisy coded dataset provides stronger privacy protection for edge devices but results in learning performance degradation. We further develop a contract-based incentive mechanism to coordinate such a conflict. The simulation results show that SCFL learns a better model within the given time and achieves a better privacy-performance tradeoff than the baseline methods. In addition, the proposed incentive mechanism grants better training performance than the conventional Stackelberg game approach.

翻译：联邦学习（FL）作为一种隐私保护的分布式训练范式，通过让边缘设备与服务器共享模型更新而非原始数据来协作训练机器学习模型，取得了巨大成功。然而，边缘设备异构的计算与通信资源导致掉队者出现，严重拖慢训练进程。为缓解该问题，我们提出一种名为“随机编码联邦学习”（SCFL）的新型FL框架，该框架利用编码计算技术。在SCFL中，训练开始前，每个边缘设备向服务器上传一个隐私保护的编码数据集，该数据集通过对投影后的本地数据集添加高斯噪声生成。训练过程中，服务器在全局编码数据集上计算梯度，以补偿掉队设备缺失的模型更新。我们设计了一种梯度聚合方案，确保聚合后的模型更新是所需全局更新的无偏估计，且该方案支持周期性模型平均以提升训练效率。我们刻画了SCFL收敛性能与隐私保护之间的权衡关系：编码数据集噪声越大，边缘设备隐私保护越强，但学习性能会随之下降。进一步，我们开发了一种基于契约的激励机制来协调这一矛盾。仿真结果表明，SCFL能在给定时间内学习到更优模型，并在隐私-性能权衡上优于基线方法。此外，所提激励机制比传统斯塔克尔伯格博弈方法实现了更好的训练性能。