Federated learning (FL) has great potential for large-scale machine learning (ML) without exposing raw data.Differential privacy (DP) is the de facto standard of privacy protection with provable guarantees.Advances in ML suggest that DP would be a perfect fit for FL with comprehensive privacy preservation. Hence, extensive efforts have been devoted to achieving practically usable FL with DP, which however is still challenging.Practitioners often not only are not fully aware of its development and categorization, but also face a hard choice between privacy and utility. Therefore, it calls for a holistic review of current advances and an investigation on the challenges and opportunities for highly usable FL systems with a DP guarantee. In this article, we first introduce the primary concepts of FL and DP, and highlight the benefits of integration. We then review the current developments by categorizing different paradigms and notions. Aiming at usable FL with DP, we present the optimization principles to seek a better tradeoff between model utility and privacy loss. Finally, we discuss future challenges in the emergent areas and relevant research topics.
翻译:联邦学习(FL)在大规模机器学习(ML)中展现出巨大潜力,其无需暴露原始数据即可实现模型训练。差分隐私(DP)作为具备可证明保证的隐私保护事实标准,机器学习领域的最新进展表明DP能够与FL完美结合,提供全面的隐私保护。因此,大量研究工作致力于实现具有实际可用性的差分隐私联邦学习,但这仍然面临巨大挑战。实践者不仅对其发展脉络和分类体系缺乏全面认知,更陷入隐私与效用之间的艰难抉择。这亟需对现有进展进行系统性综述,并深入探究构建高效实用且满足DP保障的联邦学习系统所面临的机遇与挑战。本文首先阐述FL与DP的核心概念,强调二者融合的优势;继而通过划分不同范式与概念体系,综述当前研究进展;针对实用化差分隐私联邦学习的目标,提出旨在实现模型效用与隐私损失更优平衡的优化原则;最后探讨新兴领域中的未来挑战及相关研究方向。