We present a polynomial-time algorithm for online differentially private synthetic data generation. For a data stream within the hypercube $[0,1]^d$ and an infinite time horizon, we develop an online algorithm that generates a differentially private synthetic dataset at each time $t$. This algorithm achieves a near-optimal accuracy bound of $O(t^{-1/d}\log(t))$ for $d\geq 2$ and $O(t^{-1}\log^{4.5}(t))$ for $d=1$ in the 1-Wasserstein distance. This result generalizes the previous work on the continual release model for counting queries to include Lipschitz queries. Compared to the offline case, where the entire dataset is available at once, our approach requires only an extra polylog factor in the accuracy bound.
翻译:我们提出了一种用于在线差分隐私合成数据生成的多项式时间算法。针对超立方体 $[0,1]^d$ 内的数据流和无限时间范围,我们开发了一种在线算法,可在每个时间 $t$ 生成差分隐私合成数据集。该算法在 1-Wasserstein 距离下,对于 $d\geq 2$ 实现了 $O(t^{-1/d}\log(t))$ 的近似最优精度界,对于 $d=1$ 实现了 $O(t^{-1}\log^{4.5}(t))$ 的精度界。这一结果将先前针对计数查询的持续发布模型推广至包括 Lipschitz 查询。与一次性提供全部数据集的离线场景相比,我们的方法仅在精度界中引入了额外的一个多对数因子。