We provide new lower bounds on the privacy guarantee of the multi-epoch Adaptive Batch Linear Queries (ABLQ) mechanism with shuffled batch sampling, demonstrating substantial gaps when compared to Poisson subsampling; prior analysis was limited to a single epoch. Since the privacy analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) is obtained by analyzing the ABLQ mechanism, this brings into serious question the common practice of implementing shuffling-based DP-SGD, but reporting privacy parameters as if Poisson subsampling was used. To understand the impact of this gap on the utility of trained machine learning models, we introduce a practical approach to implement Poisson subsampling at scale using massively parallel computation, and efficiently train models with the same. We compare the utility of models trained with Poisson-subsampling-based DP-SGD, and the optimistic estimates of utility when using shuffling, via our new lower bounds on the privacy guarantee of ABLQ with shuffling.
翻译:我们针对采用洗牌批量采样的多轮自适应批量线性查询(ABLQ)机制,提出了新的隐私保证下界,结果表明其与泊松子采样相比存在显著差距;先前分析仅限于单轮情形。由于差分隐私随机梯度下降(DP-SGD)的隐私分析是通过分析ABLQ机制获得的,这严重质疑了当前普遍实践:即实施基于洗牌采样的DP-SGD,却按照泊松子采样的假设报告隐私参数。为理解该差距对训练所得机器学习模型效用的影响,我们提出了一种利用大规模并行计算实现可扩展泊松子采样的实用方法,并基于此高效训练模型。通过我们提出的洗牌采样ABLQ隐私保证新下界,我们比较了基于泊松子采样的DP-SGD所训练模型的效用,以及使用洗牌采样时基于乐观估计的模型效用。