ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP to real world complex ML models are still few and far between. The adoption of DP is hindered by limited practical guidance of what DP protection entails, what privacy guarantees to aim for, and the difficulty of achieving good privacy-utility-computation trade-offs for ML models. Tricks for tuning and maximizing performance are scattered among papers or stored in the heads of practitioners. Furthermore, the literature seems to present conflicting evidence on how and whether to apply architectural adjustments and which components are "safe" to use with DP. This work is a self-contained guide that gives an in-depth overview of the field of DP ML and presents information about achieving the best possible DP ML model with rigorous privacy guarantees. Our target audience is both researchers and practitioners. Researchers interested in DP for ML will benefit from a clear overview of current advances and areas for improvement. We include theory-focused sections that highlight important topics such as privacy accounting and its assumptions, and convergence. For a practitioner, we provide a background in DP theory and a clear step-by-step guide for choosing an appropriate privacy definition and approach, implementing DP training, potentially updating the model architecture, and tuning hyperparameters. For both researchers and practitioners, consistently and fully reporting privacy guarantees is critical, and so we propose a set of specific best practices for stating guarantees.
翻译:机器学习模型在现实世界应用中无处不在,并始终是研究的焦点。与此同时,社区已开始认识到保护机器学习训练数据隐私的重要性。差分隐私已成为对数据匿名化进行正式声明的黄金标准。然而,尽管差分隐私在工业界已有一些应用,但将其应用于真实世界复杂机器学习模型的尝试仍然寥寥无几。差分隐私的采用受到以下因素的限制:关于DP保护具体内容、应追求的隐私保证目标以及实现机器学习模型良好隐私-效用-计算权衡的困难等方面的实用指导有限。调整和优化性能的技巧分散在论文中或仅存于从业者的头脑中。此外,文献中关于如何以及是否应用架构调整、哪些组件可与DP“安全”使用的证据似乎相互矛盾。本文是一份独立完整的指南,深入概述了差分隐私机器学习领域,并提供了关于如何实现具有严格隐私保证的最佳DP ML模型的信息。我们的目标受众包括研究人员和从业者。对ML中DP感兴趣的研究人员将受益于当前进展和待改进领域的清晰概述。我们包含理论部分,重点强调隐私核算及其假设、收敛等重要主题。对于从业者,我们提供了DP理论背景,并给出了清晰的分步指南:选择适当的隐私定义和方法、实施DP训练、可能更新模型架构以及调整超参数。对于研究人员和从业者而言,一致且完整地报告隐私保证至关重要,因此我们提出了一套关于声明保证的具体最佳实践。