We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the composition of a subsampled mechanism are determined by self-composing the worst-case datasets for the uncomposed mechanism. We show that this is not true in general. Second, Poisson subsampling is sometimes assumed to have similar privacy guarantees compared to sampling without replacement. We show that the privacy guarantees may in fact differ significantly between the two sampling schemes. In particular, we give an example of hyperparameters that result in $\varepsilon \approx 1$ for Poisson subsampling and $\varepsilon > 10$ for sampling without replacement. This occurs for some parameters that could realistically be chosen for DP-SGD.
翻译:本文研究子采样差分隐私机制组合的紧致隐私保障计算问题。现有算法虽能以任意精度数值计算隐私参数,但需谨慎应用。我们的主要贡献在于澄清两个常见误解点:首先,部分隐私计算器假设子采样机制组合的隐私保障由未组合机制的最坏情况数据集自组合决定,我们证明该假设在一般情况下不成立;其次,泊松子采样常被认为与无放回采样具有相似的隐私保障,我们揭示两种采样方案的隐私保障实际上可能存在显著差异。特别地,我们给出了一组超参数示例,使得泊松子采样对应的隐私参数$\varepsilon \approx 1$,而无放回采样对应的$\varepsilon > 10$。这种情况可能出现在DP-SGD实际选用的某些参数配置中。