Federated learning (FL) has emerged as a highly effective paradigm for privacy-preserving collaborative training among different parties. Unlike traditional centralized learning, which requires collecting data from each party, FL allows clients to share privacy-preserving information without exposing private datasets. This approach not only guarantees enhanced privacy protection but also facilitates more efficient and secure collaboration among multiple participants. Therefore, FL has gained considerable attention from researchers, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on methods sharing model parameters during the training process, while overlooking the potential of sharing other forms of local information. In this paper, we present a systematic survey from a new perspective, i.e., what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. This survey differs from previous ones due to four distinct contributions. First, we present a new taxonomy of FL methods in terms of the sharing methods, which includes three categories of shared information: model sharing, synthetic data sharing, and knowledge sharing. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms that provide certain privacy guarantees. Third, we conduct extensive experiments to compare the performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we discuss potential deficiencies in current methods and outline future directions for improvement.
翻译:联邦学习(FL)已成为一种在多方间进行隐私保护协作训练的高效范式。与传统集中式学习需收集各方数据不同,FL允许客户端共享隐私保护信息而不暴露私有数据集。该方法不仅保障了更强的隐私保护,还促进了多参与者间更高效安全的协作。因此,FL已获得研究者广泛关注,催生出大量综述总结相关研究工作。然而,现有综述大多聚焦于训练过程中共享模型参数的方法,忽略了共享其他形式本地信息的潜力。本文从全新视角——即联邦学习中的共享内容——进行系统性综述,重点关注模型效用、隐私泄露与通信效率。本综述的独特性体现在四项贡献:第一,提出基于共享方法的FL新分类体系,包含三类共享信息:模型共享、合成数据共享与知识共享;第二,分析不同共享方法对隐私攻击的脆弱性,并评述提供特定隐私保障的防御机制;第三,开展大量实验比较FL中不同共享方法的性能与通信开销,同时通过模型反转和成员推断攻击评估潜在隐私泄露风险,并对比各类防御方法的有效性;第四,探讨当前方法的潜在缺陷并指出未来改进方向。