We present a polynomial-time algorithm for online differentially private synthetic data generation. For a data stream within the hypercube $[0,1]^d$ and an infinite time horizon, we develop an online algorithm that generates a differentially private synthetic dataset at each time $t$. This algorithm achieves a near-optimal accuracy bound of $O(\log(t)t^{-1/d})$ for $d\geq 2$ and $O(\log^{4.5}(t)t^{-1})$ for $d=1$ in the 1-Wasserstein distance. This result extends the previous work on the continual release model for counting queries to Lipschitz queries. Compared to the offline case, where the entire dataset is available at once, our approach requires only an extra polylog factor in the accuracy bound.
翻译:我们提出了一种用于在线差分隐私合成数据生成的多项式时间算法。针对超立方体$[0,1]^d$内的数据流和无限时间范围,我们开发了一种在线算法,能在每个时刻$t$生成差分隐私合成数据集。该算法在1-Wasserstein距离下,对于$d\geq 2$实现了$O(\log(t)t^{-1/d})$的近似最优精度界,对于$d=1$实现了$O(\log^{4.5}(t)t^{-1})$的精度界。该结果将先前关于计数查询的持续发布模型研究扩展到了Lipschitz查询场景。与离线场景(整个数据集一次性可用)相比,我们的方法仅在精度界中需要额外的多对数因子。