Federated learning (FL) is an emerging machine learning (ML) training paradigm where clients own their data and collaborate to train a global model, without revealing any data to the server and other participants. Researchers commonly perform experiments in a simulation environment to quickly iterate on ideas. However, existing open-source tools do not offer the efficiency required to simulate FL on larger and more realistic FL datasets. We introduce pfl-research, a fast, modular, and easy-to-use Python framework for simulating FL. It supports TensorFlow, PyTorch, and non-neural network models, and is tightly integrated with state-of-the-art privacy algorithms. We study the speed of open-source FL frameworks and show that pfl-research is 7-72$\times$ faster than alternative open-source frameworks on common cross-device setups. Such speedup will significantly boost the productivity of the FL research community and enable testing hypotheses on realistic FL datasets that were previously too resource intensive. We release a suite of benchmarks that evaluates an algorithm's overall performance on a diverse set of realistic scenarios. The code is available on GitHub at https://github.com/apple/pfl-research.
翻译:联邦学习(Federated Learning, FL)是一种新兴的机器学习(ML)训练范式,其中客户端拥有各自的数据,并协作训练一个全局模型,而无需向服务器或其他参与者泄露任何数据。研究人员通常在仿真环境中进行实验,以快速迭代想法。然而,现有的开源工具无法提供在更大、更现实的联邦学习数据集上进行仿真所需的效率。我们介绍了 pfl-research,一个快速、模块化且易于使用的 Python 框架,用于仿真联邦学习。它支持 TensorFlow、PyTorch 以及非神经网络模型,并与最先进的隐私算法紧密集成。我们研究了开源联邦学习框架的速度,并表明在常见的跨设备设置中,pfl-research 比其他开源框架快 7-72$\times$。这种加速将显著提升联邦学习研究社区的生产力,并使得在以往因资源需求过高而无法测试的现实联邦学习数据集上验证假设成为可能。我们发布了一套基准测试,用于评估算法在多样化现实场景下的整体性能。代码可在 GitHub 上获取:https://github.com/apple/pfl-research。