We introduce the Balls-and-Bins sampling for differentially private (DP) optimization methods such as DP-SGD. While it has been common practice to use some form of shuffling in DP-SGD implementations, privacy accounting algorithms have typically assumed that Poisson subsampling is used instead. Recent work by Chua et al. (ICML 2024) however pointed out that shuffling based DP-SGD can have a much larger privacy cost in practical regimes of parameters. We show that the Balls-and-Bins sampling achieves the "best-of-both" samplers, namely, the implementation of Balls-and-Bins sampling is similar to that of Shuffling and models trained using DP-SGD with Balls-and-Bins sampling achieve utility comparable to those trained using DP-SGD with Shuffling at the same noise multiplier, and yet, Balls-and-Bins sampling enjoys similar-or-better privacy amplification as compared to Poisson subsampling in practical regimes.
翻译:本文提出了一种用于差分隐私优化方法(如DP-SGD)的球箱采样法。尽管在DP-SGD的实际实现中通常采用某种形式的乱序采样,但隐私核算算法通常假设使用的是泊松子采样。然而,Chua等人(ICML 2024)的最新研究指出,基于乱序采样的DP-SGD在实际参数范围内可能产生更高的隐私代价。我们证明球箱采样实现了“双优”采样特性:其实现方式与乱序采样类似,且使用球箱采样的DP-SGD训练模型在相同噪声乘数下能达到与乱序采样相当的效用;同时在实际参数范围内,球箱采样能获得与泊松子采样相当或更优的隐私放大效果。