The issue of potential privacy leakage during centralized AI's model training has drawn intensive concern from the public. A Parallel and Distributed Computing (or PDC) scheme, termed Federated Learning (FL), has emerged as a new paradigm to cope with the privacy issue by allowing clients to perform model training locally, without the necessity to upload their personal sensitive data. In FL, the number of clients could be sufficiently large, but the bandwidth available for model distribution and re-upload is quite limited, making it sensible to only involve part of the volunteers to participate in the training process. The client selection policy is critical to an FL process in terms of training efficiency, the final model's quality as well as fairness. In this paper, we will model the fairness guaranteed client selection as a Lyapunov optimization problem and then a C2MAB-based method is proposed for estimation of the model exchange time between each client and the server, based on which we design a fairness guaranteed algorithm termed RBCS-F for problem-solving. The regret of RBCS-F is strictly bounded by a finite constant, justifying its theoretical feasibility. Barring the theoretical results, more empirical data can be derived from our real training experiments on public datasets.
翻译:在集中式人工智能模型训练过程中可能存在的隐私泄露问题已引起公众的广泛关注。一种称为联邦学习(FL)的并行分布式计算方案应运而生,它允许客户端在本地进行模型训练而无需上传个人敏感数据,从而解决了隐私问题。在联邦学习中,客户端数量可能非常庞大,但用于模型分发和重新上传的带宽相当有限,因此仅让部分志愿客户端参与训练过程是合理的。客户端选择策略对于联邦学习过程的训练效率、最终模型质量以及公平性都至关重要。本文将保证公平性的客户端选择问题建模为李雅普诺夫优化问题,并提出基于C2MAB的方法来估计每个客户端与服务器之间的模型交换时间,据此设计了一种保证公平性的RBCS-F算法。RBCS-F的遗憾被严格限制在一个有限常数内,验证了其理论可行性。除理论结果外,我们在公共数据集上的真实训练实验还获得了更多经验数据。