We present a highly effective algorithmic approach for generating $\varepsilon$-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset $X$ in the hypercube $[0,1]^d$, our algorithm generates synthetic dataset $Y$ such that the expected 1-Wasserstein distance between the empirical measure of $X$ and $Y$ is $O((\varepsilon n)^{-1/d})$ for $d\geq 2$, and is $O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$ for $d=1$. The accuracy guarantee is optimal up to a constant factor for $d\geq 2$, and up to a logarithmic factor for $d=1$. Our algorithm has a fast running time of $O(\varepsilon n)$ for all $d\geq 1$ and demonstrates improved accuracy compared to the method in (Boedihardjo et al., 2022) for $d\geq 2$.
翻译:我们提出一种高度有效的算法方法,用于在有界度量空间中生成ε-差分隐私合成数据,并在1-Wasserstein距离下实现近乎最优的效用保证。具体而言,对于超立方体[0,1]^d中的数据集X,我们的算法生成合成数据集Y,使得X与Y的经验测度之间的期望1-Wasserstein距离:当d≥2时为O((εn)^{-1/d}),当d=1时为O(log^2(εn)(εn)^{-1})。该精度保证在d≥2时达到常数因子内的最优,在d=1时达到对数因子内的最优。我们的算法对所有d≥1均具有O(εn)的快速运行时间,并且在d≥2时相比(Boedihardjo et al., 2022)的方法展现出更优的准确性。