Given an empirical distribution $f(x)$ of sensitive data $x$, we consider the task of minimizing $F(y) = D_{\text{KL}} (f(x)\Vert y)$ over a probability simplex, while protecting the privacy of $x$. We observe that, if we take the exponential mechanism and use the KL divergence as the loss function, then the resulting algorithm is the Dirichlet mechanism that outputs a single draw from a Dirichlet distribution. Motivated by this, we propose a R\'enyi differentially private (RDP) algorithm that employs the Dirichlet mechanism to solve the KL divergence minimization task. In addition, given $f(x)$ as above and $\hat{y}$ an output of the Dirichlet mechanism, we prove a probability tail bound on $D_{\text{KL}} (f(x)\Vert \hat{y})$, which is then used to derive a lower bound for the sample complexity of our RDP algorithm. Experiments on real-world datasets demonstrate advantages of our algorithm over Gaussian and Laplace mechanisms in supervised classification and maximum likelihood estimation.
翻译:给定敏感数据$x$的经验分布$f(x)$,我们考虑在概率单纯形上最小化$F(y) = D_{\text{KL}} (f(x)\Vert y)$的同时保护$x$的隐私。观察到若采用指数机制并以KL散度作为损失函数,则所得算法即为输出狄利克雷分布单次采样的狄利克雷机制。受此启发,我们提出一种基于狄利克雷机制的Rényi差分隐私(RDP)算法来解决KL散度最小化问题。进一步,对于上述$f(x)$及狄利克雷机制的输出$\hat{y}$,我们证明了$D_{\text{KL}} (f(x)\Vert \hat{y})$的概率尾界,并据此推导出RDP算法样本复杂度的下界。在真实数据集上的实验表明,该算法在监督分类和极大似然估计任务中相较于高斯机制与拉普拉斯机制具有显著优势。