A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency

Federated learning (FL) has emerged as a highly effective paradigm for privacy-preserving collaborative training among different parties. Unlike traditional centralized learning, which requires collecting data from each party, FL allows clients to share privacy-preserving information without exposing private datasets. This approach not only guarantees enhanced privacy protection but also facilitates more efficient and secure collaboration among multiple participants. Therefore, FL has gained considerable attention from researchers, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on methods sharing model parameters during the training process, while overlooking the potential of sharing other forms of local information. In this paper, we present a systematic survey from a new perspective, i.e., what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. This survey differs from previous ones due to four distinct contributions. First, we present a new taxonomy of FL methods in terms of the sharing methods, which includes three categories of shared information: model sharing, synthetic data sharing, and knowledge sharing. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms that provide certain privacy guarantees. Third, we conduct extensive experiments to compare the performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we discuss potential deficiencies in current methods and outline future directions for improvement.

翻译：联邦学习（FL）已成为一种在多方间进行隐私保护协作训练的高效范式。与传统集中式学习需收集各方数据不同，FL允许客户端共享隐私保护信息而不暴露私有数据集。该方法不仅保障了更强的隐私保护，还促进了多参与者间更高效安全的协作。因此，FL已获得研究者广泛关注，催生出大量综述总结相关研究工作。然而，现有综述大多聚焦于训练过程中共享模型参数的方法，忽略了共享其他形式本地信息的潜力。本文从全新视角——即联邦学习中的共享内容——进行系统性综述，重点关注模型效用、隐私泄露与通信效率。本综述的独特性体现在四项贡献：第一，提出基于共享方法的FL新分类体系，包含三类共享信息：模型共享、合成数据共享与知识共享；第二，分析不同共享方法对隐私攻击的脆弱性，并评述提供特定隐私保障的防御机制；第三，开展大量实验比较FL中不同共享方法的性能与通信开销，同时通过模型反转和成员推断攻击评估潜在隐私泄露风险，并对比各类防御方法的有效性；第四，探讨当前方法的潜在缺陷并指出未来改进方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/