We present a highly effective algorithmic approach for generating $\varepsilon$-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset $X$ in the hypercube $[0,1]^d$, our algorithm generates synthetic dataset $Y$ such that the expected 1-Wasserstein distance between the empirical measure of $X$ and $Y$ is $O((\varepsilon n)^{-1/d})$ for $d\geq 2$, and is $O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$ for $d=1$. The accuracy guarantee is optimal up to a constant factor for $d\geq 2$, and up to a logarithmic factor for $d=1$. Our algorithm has a fast running time of $O(\varepsilon dn)$ for all $d\geq 1$ and demonstrates improved accuracy compared to the method in (Boedihardjo et al., 2022) for $d\geq 2$.
翻译:我们提出了一种高度有效的算法方法,用于在有界度量空间中生成$\varepsilon$-差分隐私合成数据,并在1-Wasserstein距离下实现了近乎最优的效用保证。具体而言,对于超立方体$[0,1]^d$中的数据集$X$,我们的算法生成合成数据集$Y$,使得当$d\geq 2$时,$X$与$Y$的经验测度之间的期望1-Wasserstein距离为$O((\varepsilon n)^{-1/d})$;当$d=1$时,该距离为$O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$。该精度保证在$d\geq 2$时达到最佳常数因子,在$d=1$时达到最佳对数因子。对于所有$d\geq 1$,我们的算法具有$O(\varepsilon dn)$的快速运行时间,并且在$d\geq 2$时相比(Boedihardjo et al., 2022)中的方法展示了更高的准确性。