We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ. While shuffling based DP-SGD is more commonly used in practical implementations, it is neither analytically nor numerically amenable to easy privacy analysis. On the other hand, Poisson subsampling based DP-SGD is challenging to scalably implement, but has a well-understood privacy analysis, with multiple open-source numerically tight privacy accountants available. This has led to a common practice of using shuffling based DP-SGD in practice, but using the privacy analysis for the corresponding Poisson subsampling version. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling, and thus advises caution in reporting privacy parameters for DP-SGD.
翻译:我们证明了自适应批量线性查询机制在不同批次采样方式——(i)随机洗牌和(ii)泊松子采样——下的隐私保证之间存在显著差距:差分隐私随机梯度下降的典型分析通过将其解释为ABLQ的后处理过程来推导。尽管基于随机洗牌的DP-SGD在实际实现中更常用,但其隐私分析在解析和数值层面均难以进行。相比之下,基于泊松子采样的DP-SGD虽难以扩展实现,但其隐私分析已得到充分理解,且存在多个数值精确的开源隐私计量工具可用。这导致了一种常见做法:实践中使用基于随机洗牌的DP-SGD,却采用对应泊松子采样版本的隐私分析。我们的结果表明,这两种批次采样方式下的隐私分析可能存在显著差距,因此建议在报告DP-SGD的隐私参数时需谨慎。