Privacy protection with synthetic data generation often uses differentially private statistics and model parameters to quantitatively express theoretical security. However, these methods do not take into account privacy protection due to the randomness of data generation. In this paper, we theoretically evaluate R\'{e}nyi differential privacy of the randomness in data generation of a synthetic data generation method that uses the mean vector and the covariance matrix of an original dataset. Specifically, for a fixed $\alpha > 1$, we show the condition of $\varepsilon$ such that the synthetic data generation satisfies $(\alpha, \varepsilon)$-R\'{e}nyi differential privacy under a bounded neighboring condition and an unbounded neighboring condition, respectively. In particular, under the unbounded condition, when the size of the original dataset and synthetic datase is 10 million, the mechanism satisfies $(4, 0.576)$-R\'{e}nyi differential privacy. We also show that when we translate it into the traditional $(\varepsilon, \delta)$-differential privacy, the mechanism satisfies $(4.00, 10^{-10})$-differential privacy.
翻译:合成数据生成中的隐私保护常通过差分隐私统计量和模型参数来定量表述理论安全性,然而这些方法未考虑数据生成随机性带来的隐私保护效果。本文从理论上评估了使用原始数据集均值向量和协方差矩阵的合成数据生成方法中,数据生成随机性所具备的Rényi差分隐私特性。具体而言,针对固定的α>1,分别在有界邻近条件与无界邻近条件下给出了使得合成数据生成满足(α,ε)-Rényi差分隐私的ε条件。特别地,在无界条件下,当原始数据集与合成数据集规模达到1000万时,该机制满足(4,0.576)-Rényi差分隐私。进一步将其转化为传统(ε,δ)-差分隐私时,该机制满足(4.00,10^{-10})-差分隐私。