Federated Learning (FL) is a machine learning approach that allows multiple clients to collaboratively learn a shared model without sharing raw data. However, current FL systems provide an all-in-one solution, which can hinder the wide adoption of FL in certain domains such as scientific applications. To overcome this limitation, this paper proposes a decoupling approach that enables clients to customize FL applications with specific data subsystems. To evaluate this approach, the authors develop a framework called Data-Decoupling Federated Learning (DDFL) and compare it with state-of-the-art FL systems that tightly couple data management and computation. Extensive experiments on various datasets and data management subsystems show that DDFL achieves comparable or better performance in terms of training time, inference accuracy, and database query time. Moreover, DDFL provides clients with more options to tune their FL applications regarding data-related metrics. The authors also provide a detailed qualitative analysis of DDFL when integrated with mainstream database systems.
翻译:联邦学习(FL)是一种允许多个客户端在不共享原始数据的情况下协同学习共享模型的机器学习方法。然而,当前的FL系统提供一体化解决方案,这可能阻碍FL在科学应用等特定领域的广泛采用。为克服这一局限,本文提出一种解耦方法,使客户端能够通过特定数据子系统定制FL应用。为评估该方法,作者开发了一个名为数据解耦联邦学习(DDFL)的框架,并将其与紧密耦合数据管理与计算的最新FL系统进行比较。在多种数据集和数据管理子系统上的广泛实验表明,DDFL在训练时间、推理精度和数据库查询时间方面达到相当或更优的性能。此外,DDFL为客户端提供了更多调整FL应用中数据相关指标的选项。作者还提供了DDFL与主流数据库系统集成时的详细定性分析。