Federated learning (FL) has emerged as a practical means for privacy-preserving distributed machine learning. FL's versatile design makes it suitable for various training settings, from IoT edge devices in cross-device FL to powerful servers in cross-silo FL. A key consequence of this versatility is the high level of diversity found in the networking configuration of FL applications. Coupled with the rising demand for large-scale models such as large language models, well-informed selection and configuration of communication backends become crucial for ensuring optimal performance in FL systems. This work focuses on cross-silo federated learning, presenting in-depth benchmarks of various communication backends, including MPI, gRPC, and PyTorch RPC. In addition, we introduce gRPC+S3, a hybrid backend designed to overcome the limitations of existing approaches, particularly for transmitting large models across geo-distributed deployments, achieving up to $3.8\times$ end-to-end speedup over gRPC. Our benchmarks examine point-to-point and end-to-end performance for a broad range of model sizes running under realistic network conditions. Our findings provide practical insights for selecting and configuring suitable communication backends tailored to the specific federated learning tasks and network configurations.
翻译:联邦学习(FL)已成为一种实现隐私保护分布式机器学习的实用手段。FL的通用性设计使其适用于多种训练场景,从跨设备FL中的物联网边缘设备到跨孤岛FL中的高性能服务器。这种通用性的一个关键后果是FL应用的网络配置存在高度多样性。再加上对大型语言模型等大规模模型日益增长的需求,明智地选择和配置通信后端对于确保FL系统的最佳性能变得至关重要。本工作聚焦于跨孤岛联邦学习,对包括MPI、gRPC和PyTorch RPC在内的多种通信后端进行了深入的基准测试。此外,我们引入了gRPC+S3,这是一种混合后端,旨在克服现有方法的局限性,特别是在跨地理分布的部署中传输大型模型时,相较于gRPC实现了高达$3.8\times$的端到端加速。我们的基准测试在真实网络条件下,针对广泛模型大小,检验了点对点和端到端的性能。我们的研究结果为针对特定联邦学习任务和网络配置选择和配置合适的通信后端提供了实践见解。