Federated learning (FL) enables a set of geographically distributed clients to collectively train a model through a server. Classically, the training process is synchronous, but can be made asynchronous to maintain its speed in presence of slow clients and in heterogeneous networks. The vast majority of Byzantine fault-tolerant FL systems however rely on a synchronous training process. Our solution is one of the first Byzantine-resilient and asynchronous FL algorithms that does not require an auxiliary server dataset and is not delayed by stragglers, which are shortcomings of previous works. Intuitively, the server in our solution waits to receive a minimum number of updates from clients on its latest model to safely update it, and is later able to safely leverage the updates that late clients might send. We compare the performance of our solution with state-of-the-art algorithms on both image and text datasets under gradient inversion, perturbation, and backdoor attacks. Our results indicate that our solution trains a model faster than previous synchronous FL solution, and maintains a higher accuracy, up to 1.54x and up to 1.75x for perturbation and gradient inversion attacks respectively, in the presence of Byzantine clients than previous asynchronous FL solutions.
翻译:联邦学习(FL)使得一组地理上分散的客户端能够通过服务器协作训练一个模型。传统上,训练过程是同步的,但可以改为异步,以在存在慢速客户端和异构网络的情况下保持训练速度。然而,绝大多数拜占庭容错的联邦学习系统依赖于同步训练过程。我们的解决方案是首批拜占庭弹性且异步的联邦学习算法之一,它不需要辅助服务器数据集,也不会因掉队者而延迟,这克服了先前工作的缺点。直观上,我们方案中的服务器会等待接收到来自客户端对其最新模型的最小数量更新后,才安全地更新模型,并且随后能够安全地利用迟到客户端可能发送的更新。我们在梯度反转、扰动和后门攻击下,于图像和文本数据集上,将我们解决方案的性能与最先进的算法进行了比较。我们的结果表明,与先前的同步联邦学习解决方案相比,我们的解决方案训练模型更快;并且在存在拜占庭客户端的情况下,与先前的异步联邦学习解决方案相比,我们的解决方案保持了更高的准确率,对于扰动攻击和梯度反转攻击,准确率分别最高可达1.54倍和1.75倍。