Understanding Communication Backends in Cross-Silo Federated Learning

Federated learning (FL) has emerged as a practical means for privacy-preserving distributed machine learning. FL's versatile design makes it suitable for various training settings, from IoT edge devices in cross-device FL to powerful servers in cross-silo FL. A key consequence of this versatility is the high level of diversity found in the networking configuration of FL applications. Coupled with the rising demand for large-scale models such as large language models, well-informed selection and configuration of communication backends become crucial for ensuring optimal performance in FL systems. This work focuses on cross-silo federated learning, presenting in-depth benchmarks of various communication backends, including MPI, gRPC, and PyTorch RPC. In addition, we introduce gRPC+S3, a hybrid backend designed to overcome the limitations of existing approaches, particularly for transmitting large models across geo-distributed deployments, achieving up to $3.8\times$ end-to-end speedup over gRPC. Our benchmarks examine point-to-point and end-to-end performance for a broad range of model sizes running under realistic network conditions. Our findings provide practical insights for selecting and configuring suitable communication backends tailored to the specific federated learning tasks and network configurations.

翻译：联邦学习（FL）已成为一种实现隐私保护分布式机器学习的实用手段。FL的通用性设计使其适用于多种训练场景，从跨设备FL中的物联网边缘设备到跨孤岛FL中的高性能服务器。这种通用性的一个关键后果是FL应用的网络配置存在高度多样性。再加上对大型语言模型等大规模模型日益增长的需求，明智地选择和配置通信后端对于确保FL系统的最佳性能变得至关重要。本工作聚焦于跨孤岛联邦学习，对包括MPI、gRPC和PyTorch RPC在内的多种通信后端进行了深入的基准测试。此外，我们引入了gRPC+S3，这是一种混合后端，旨在克服现有方法的局限性，特别是在跨地理分布的部署中传输大型模型时，相较于gRPC实现了高达$3.8\times$的端到端加速。我们的基准测试在真实网络条件下，针对广泛模型大小，检验了点对点和端到端的性能。我们的研究结果为针对特定联邦学习任务和网络配置选择和配置合适的通信后端提供了实践见解。

相关内容

联邦学习

关注 200

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。

《联邦学习在网络安全中的应用：性能、鲁棒性与对抗性威胁》2025最新145页

专知会员服务

20+阅读 · 2025年9月18日

【剑桥大学博士论文】联邦自监督学习，141页pdf

专知会员服务

19+阅读 · 2024年6月15日