We present a practical method to audit the differential privacy (DP) guarantees of a machine learning model using a small hold-out dataset that is not exposed to the model during the training. Having a score function such as the loss function employed during the training, our method estimates the total variation (TV) distance between scores obtained with a subset of the training data and the hold-out dataset. With some meta information about the underlying DP training algorithm, these TV distance values can be converted to $(\varepsilon,\delta)$-guarantees for any $\delta$. We show that these score distributions asymptotically give lower bounds for the DP guarantees of the underlying training algorithm, however, we perform a one-shot estimation for practicality reasons. We specify conditions that lead to lower bounds for the DP guarantees with high probability. To estimate the TV distance between the score distributions, we use a simple density estimation method based on histograms. We show that the TV distance gives a very close to optimally robust estimator and has an error rate $\mathcal{O}(k^{-1/3})$, where $k$ is the total number of samples. Numerical experiments on benchmark datasets illustrate the effectiveness of our approach and show improvements over baseline methods for black-box auditing.
翻译:我们提出了一种实用方法,用于审计机器学习模型的差分隐私(DP)保证,该方法使用一个在训练期间未暴露给模型的小型保留数据集。通过采用训练期间使用的损失函数等评分函数,我们的方法估计了使用训练数据子集获得的分数与保留数据集获得的分数之间的全变差(TV)距离。借助底层DP训练算法的一些元信息,这些TV距离值可以转换为任意$\delta$对应的$(\varepsilon,\delta)$保证。我们证明这些分数分布渐近地给出了底层训练算法DP保证的下界,但出于实用性考虑,我们执行单次估计。我们明确了以高概率获得DP保证下界的条件。为了估计分数分布之间的TV距离,我们采用了一种基于直方图的简单密度估计方法。我们证明TV距离提供了接近最优鲁棒性的估计量,其误差率为$\mathcal{O}(k^{-1/3})$,其中$k$为样本总数。在基准数据集上的数值实验验证了我们方法的有效性,并显示其优于黑盒审计的基线方法。