Differentially private mean estimation is an important building block in privacy-preserving algorithms for data analysis and machine learning. Though the trade-off between privacy and utility is well understood in the worst case, many datasets exhibit structure that could potentially be exploited to yield better algorithms. In this paper we present $\textit{Private Limit Adapted Noise (PLAN)}$, a family of differentially private algorithms for mean estimation in the setting where inputs are independently sampled from a distribution $\mathcal{D}$ over $\mathbf{R}^d$, with coordinate-wise standard deviations $\boldsymbol{\sigma} \in \mathbf{R}^d$. Similar to mean estimation under Mahalanobis distance, PLAN tailors the shape of the noise to the shape of the data, but unlike previous algorithms the privacy budget is spent non-uniformly over the coordinates. Under a concentration assumption on $\mathcal{D}$, we show how to exploit skew in the vector $\boldsymbol{\sigma}$, obtaining a (zero-concentrated) differentially private mean estimate with $\ell_2$ error proportional to $\|\boldsymbol{\sigma}\|_1$. Previous work has either not taken $\boldsymbol{\sigma}$ into account, or measured error in Mahalanobis distance $\unicode{x2013}$ in both cases resulting in $\ell_2$ error proportional to $\sqrt{d}\|\boldsymbol{\sigma}\|_2$, which can be up to a factor $\sqrt{d}$ larger. To verify the effectiveness of \algorithmname, we empirically evaluate accuracy on both synthetic and real world data.
翻译:摘要:差分隐私均值估计是数据分析和机器学习中隐私保护算法的重要构建模块。尽管在最坏情况下隐私与效用之间的权衡已被充分理解,但许多数据集展现出可被利用以改进算法的结构。本文提出$\textit{私有极限自适应噪声(PLAN)}$,这是一类差分隐私均值估计算法,适用于输入独立采样自分布$\mathcal{D}$(定义于$\mathbf{R}^d$上,坐标方向标准差为$\boldsymbol{\sigma} \in \mathbf{R}^d$)的设定。类似于马氏距离下的均值估计,PLAN根据数据形状定制噪声形状,但与先前算法不同,其隐私预算在坐标上非均匀分配。在$\mathcal{D}$的集中性假设下,我们展示了如何利用向量$\boldsymbol{\sigma}$的偏斜性,获得$\ell_2$误差正比于$\|\boldsymbol{\sigma}\|_1$的(零集中)差分隐私均值估计。先前工作要么未考虑$\boldsymbol{\sigma}$,要么使用马氏距离度量误差——两种情形均导致$\ell_2$误差正比于$\sqrt{d}\|\boldsymbol{\sigma}\|_2$,这最多可能增大$\sqrt{d}$倍。为验证算法有效性,我们在合成数据和真实数据上进行了经验性精度评估。