We present a highly effective algorithmic approach for generating $\varepsilon$-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset $\mathcal X$ in the hypercube $[0,1]^d$, our algorithm generates synthetic dataset $\mathcal Y$ such that the expected 1-Wasserstein distance between the empirical measure of $\mathcal X$ and $\mathcal Y$ is $O((\varepsilon n)^{-1/d})$ for $d\geq 2$, and is $O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$ for $d=1$. The accuracy guarantee is optimal up to a constant factor for $d\geq 2$, and up to a logarithmic factor for $d=1$. Our algorithm has a fast running time of $O(\varepsilon n)$ for all $d\geq 1$ and demonstrates improved accuracy compared to the method in (Boedihardjo et al., 2022) for $d\geq 2$.
翻译:我们提出了一种高度有效的算法方法,用于在有界度量空间中生成满足$\varepsilon$-差分隐私的合成数据,并在1-瓦瑟斯坦距离下提供近乎最优的效用保证。具体而言,对于超立方体$[0,1]^d$中的数据集$\mathcal X$,我们的算法生成合成数据集$\mathcal Y$,使得当$d\geq 2$时,$\mathcal X$与$\mathcal Y$经验测度之间的期望1-瓦瑟斯坦距离为$O((\varepsilon n)^{-1/d})$;当$d=1$时,该距离为$O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$。对于$d\geq 2$,该精度保证在常数因子内最优;对于$d=1$,则在对数因子内最优。该算法对所有$d\geq 1$具有$O(\varepsilon n)$的快速运行时间,并且在$d\geq 2$时相比(Boedihardjo et al., 2022)中的方法展现出更高的精度。