Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In this paper, we propose "MMCC", the first algorithm to analyze privacy amplification via sampling for any generic matrix mechanism. MMCC is nearly tight in that it approaches a lower bound as $\epsilon\to0$. To analyze correlated outputs in MMCC, we prove that they can be analyzed as if they were independent, by conditioning them on prior outputs. Our "conditional composition theorem" has broad utility: we use it to show that the noise added to binary-tree-DP-FTRL can asymptotically match the noise added to DP-SGD with amplification. Our amplification algorithm also has practical empirical utility: we show it leads to significant improvement in the privacy-utility trade-offs for DP-FTRL algorithms on standard benchmarks.
翻译:隐私放大利用数据选择中的随机性来提供更严格的差分隐私(DP)保证。这一分析对于DP-SGD在机器学习中的成功至关重要,但难以直接应用于最新的先进算法。这是因为这些被称为DP-FTRL的算法采用矩阵机制添加相关噪声,而非DP-SGD中使用的独立噪声。本文提出"MMCC"——首个能够分析任意通用矩阵机制在采样情况下隐私放大的算法。MMCC具有近乎紧致的特性,当$\epsilon\to0$时趋近下界。为了分析MMCC中的相关输出,我们证明通过将其条件化于先前输出,这些输出可以像独立输出一样进行分析。我们的"条件组合定理"具有广泛实用性:利用该定理,我们证明二叉树上DP-FTRL添加的噪声在渐近意义上可与带有放大的DP-SGD噪声相匹配。该放大算法在实际应用中同样有效:实验表明,在标准基准测试中,它能够显著改善DP-FTRL算法的隐私-效用权衡。