We present Project Florida, a system architecture and software development kit (SDK) enabling deployment of large-scale Federated Learning (FL) solutions across a heterogeneous device ecosystem. Federated learning is an approach to machine learning based on a strong data sovereignty principle, i.e., that privacy and security of data is best enabled by storing it at its origin, whether on end-user devices or in segregated cloud storage silos. Federated learning enables model training across devices and silos while the training data remains within its security boundary, by distributing a model snapshot to a client running inside the boundary, running client code to update the model, and then aggregating updated snapshots across many clients in a central orchestrator. Deploying a FL solution requires implementation of complex privacy and security mechanisms as well as scalable orchestration infrastructure. Scale and performance is a paramount concern, as the model training process benefits from full participation of many client devices, which may have a wide variety of performance characteristics. Project Florida aims to simplify the task of deploying cross-device FL solutions by providing cloud-hosted infrastructure and accompanying task management interfaces, as well as a multi-platform SDK supporting most major programming languages including C++, Java, and Python, enabling FL training across a wide range of operating system (OS) and hardware specifications. The architecture decouples service management from the FL workflow, enabling a cloud service provider to deliver FL-as-a-service (FLaaS) to ML engineers and application developers. We present an overview of Florida, including a description of the architecture, sample code, and illustrative experiments demonstrating system capabilities.
翻译:我们介绍Project Florida,一个支持在异构设备生态系统中部署大规模联邦学习(FL)解决方案的系统架构与软件开发工具包(SDK)。联邦学习是一种基于强数据主权原则的机器学习方法,即通过将数据存储于其原始位置(无论是终端用户设备还是隔离的云存储孤岛),以最优方式保障数据的隐私与安全。联邦学习通过在安全边界内运行的客户端分发模型快照、执行客户端代码更新模型,并在中央编排器中聚合来自众多客户端的更新快照,实现跨设备与跨孤岛的模型训练,同时确保训练数据始终驻留于安全边界内。部署FL解决方案需要实施复杂的隐私与安全机制,以及可扩展的编排基础设施。规模与性能是核心关注点,因为模型训练过程受益于大量性能特征各异的客户端设备的全面参与。Project Florida旨在通过提供云托管基础设施及其配套的任务管理接口,以及支持C++、Java和Python等主流编程语言的多平台SDK,简化跨设备FL解决方案的部署任务,从而实现对广泛操作系统(OS)与硬件规格的FL训练支持。该架构将服务管理与FL工作流解耦,使云服务提供商能够向机器学习工程师与应用开发者提供联邦学习即服务(FLaaS)。我们呈现Florida的概览,包括架构描述、示例代码,以及展示系统能力的说明性实验。