Current practices for reporting differential privacy (DP) guarantees for machine learning (ML) algorithms such as DP-SGD provide an incomplete and potentially misleading picture. For instance, if only a single $(\varepsilon, δ)$ is known about a mechanism, standard analyses show that there could exist highly accurate inference attacks against training data records, when, upon a more careful analysis, such accurate attacks do not exist for most practical mechanisms. In this position paper, we argue that using _non-asymptotic_ Gaussian Differential Privacy (GDP) as the primary means of communicating DP guarantees in ML avoids these potential downsides. Using two recent developments in the DP literature: (i) open-source numerical accountants capable of computing the privacy profile and $f$-DP curves of DP-SGD to arbitrary accuracy, and (ii) a decision-theoretic metric over DP representations, we show how to provide non-asymptotic bounds on GDP using numerical accountants, and show that GDP can capture the entire privacy profile of DP-SGD and related algorithms with virtually no error, as quantified by the metric. To support our claims, we investigate the privacy profiles of state-of-the-art DP large-scale image classification, and the TopDown algorithm for the U.S. Decennial Census, observing that GDP fits their profiles remarkably well in all cases. We conclude with a discussion on the strengths and weaknesses of this approach, and discuss which other privacy mechanisms could benefit from GDP.
翻译:当前,关于机器学习算法(如DP-SGD)的差分隐私(DP)保证的报告实践存在不完整且可能误导的问题。例如,若仅知晓机制的单一$(\varepsilon, δ)$值,标准分析表明可能存在针对训练数据记录的高度精确推断攻击;而更细致的分析却显示,对大多数实际机制而言,此类精确攻击并不存在。在本立场论文中,我们主张将_非渐近_高斯差分隐私(GDP)作为机器学习中DP保证的主要沟通方式,可避免这些潜在缺陷。结合DP文献中的两项最新进展:(i)开源数值计算器能够以任意精度计算DP-SGD的隐私特征曲线和$f$-DP曲线,以及(ii)针对DP表示的决策理论度量标准,我们展示了如何利用数值计算器提供GDP的非渐近界,并论证了GDP能够以该度量标准衡量的几乎零误差捕获DP-SGD及其相关算法的完整隐私特征。为支撑我们的观点,我们考察了最新DP大规模图像分类及美国十年人口普查的TopDown算法的隐私特征,观察到GDP在所有情况下均能完美拟合其隐私特征曲线。最后,我们讨论了该方法的优缺点,并指出哪些其他隐私机制可能从GDP中受益。