Differential privacy schemes have been widely adopted in recent years to address issues of data privacy protection. We propose a new Gaussian scheme combining with another data protection technique, called random orthogonal matrix masking, to achieve $(\varepsilon, \delta)$-differential privacy (DP) more efficiently. We prove that the additional matrix masking significantly reduces the rate of noise variance required in the Gaussian scheme to achieve $(\varepsilon, \delta)-$DP in big data setting. Specifically, when $\varepsilon \to 0$, $\delta \to 0$, and the sample size $n$ exceeds the number $p$ of attributes by $(n-p)=O(ln(1/\delta))$, the required additive noise variance to achieve $(\varepsilon, \delta)$-DP is reduced from $O(ln(1/\delta)/\varepsilon^2)$ to $O(1/\varepsilon)$. With much less noise added, the resulting differential privacy protected pseudo data sets allow much more accurate inferences, thus can significantly improve the scope of application for differential privacy.
翻译:近年来,差分隐私方案已被广泛采用以解决数据隐私保护问题。我们提出一种结合另一种数据保护技术(称为随机正交矩阵掩码)的新型高斯方案,以更高效地实现$(\varepsilon, \delta)$-差分隐私。我们证明,在大数据场景下,额外的矩阵掩码显著降低了高斯方案中为达到$(\varepsilon, \delta)$-差分隐私所需的噪声方差率。具体而言,当$\varepsilon \to 0$、$\delta \to 0$且样本量$n$超过属性数量$p$满足$(n-p)=O(\ln(1/\delta))$时,为实现$(\varepsilon, \delta)$-差分隐私所需的加性噪声方差从$O(\ln(1/\delta)/\varepsilon^2)$降至$O(1/\varepsilon)$。由于添加的噪声大幅减少,由此生成的差分隐私保护伪数据集能够实现更精确的推断,从而显著扩展差分隐私的应用范围。