Machine learning models should not reveal particular information that is not otherwise accessible. Differential privacy provides a formal framework to mitigate privacy risks by ensuring that the inclusion or exclusion of any single data point does not significantly alter the output of an algorithm, thus limiting the exposure of private information. This survey reviews the foundational definitions of differential privacy and traces their evolution through key theoretical and applied contributions. It then provides an in-depth examination of how DP has been integrated into machine learning models, analyzing existing proposals and methods to preserve privacy when training ML models. Finally, it describes how DP-based ML techniques can be evaluated in practice. By offering a comprehensive overview of differential privacy in machine learning, this work aims to contribute to the ongoing development of secure and responsible AI systems.
翻译:机器学习模型不应泄露原本无法访问的特定信息。差分隐私通过确保任何单个数据点的包含或排除不会显著改变算法的输出,从而限制私人信息的暴露,为缓解隐私风险提供了一个形式化框架。本综述回顾了差分隐私的基本定义,并通过关键的理论与应用贡献追溯其演变历程。随后深入探讨了差分隐私如何被整合到机器学习模型中,分析了在训练机器学习模型时保护隐私的现有方案与方法。最后,阐述了基于差分隐私的机器学习技术在实际中应如何评估。通过对机器学习中差分隐私的全面概述,本工作旨在为持续开发安全可靠的人工智能系统作出贡献。