We introduce the Balls-and-Bins sampling for differentially private (DP) optimization methods such as DP-SGD. While it has been common practice to use some form of shuffling in DP-SGD implementations, privacy accounting algorithms have typically assumed that Poisson subsampling is used instead. Recent work by Chua et al. (ICML 2024), however, pointed out that shuffling based DP-SGD can have a much larger privacy cost in practical regimes of parameters. In this work we show that the Balls-and-Bins sampling achieves the "best-of-both" samplers, namely, the implementation of Balls-and-Bins sampling is similar to that of Shuffling and models trained using DP-SGD with Balls-and-Bins sampling achieve utility comparable to those trained using DP-SGD with Shuffling at the same noise multiplier, and yet, Balls-and-Bins sampling enjoys similar-or-better privacy amplification as compared to Poisson subsampling in practical regimes.
翻译:我们为差分隐私(DP)优化方法(如DP-SGD)引入了球箱采样法。尽管在DP-SGD的实际实现中通常采用某种形式的打乱采样,但隐私核算算法通常假设使用的是泊松子采样。然而,Chua等人(ICML 2024)的最新研究指出,基于打乱采样的DP-SGD在实际参数范围内可能产生更高的隐私成本。本研究表明,球箱采样法实现了“双优”采样特性:其实现方式与打乱采样类似,且使用球箱采样的DP-SGD训练模型在相同噪声乘数下能达到与打乱采样相当的效用;同时在实际参数范围内,球箱采样相比泊松子采样具有相当或更优的隐私放大效应。