Minimax optimal differentially private synthetic data for smooth queries

Differentially private synthetic data enables the sharing and analysis of sensitive datasets while providing rigorous privacy guarantees for individual contributors. A central challenge is to achieve strong utility guarantees for meaningful downstream analysis. Many existing methods ensure uniform accuracy over broad query classes, such as all Lipschitz functions, but this level of generality often leads to suboptimal rates for statistics of practical interest. Since many common data analysis queries exhibit smoothness beyond what worst-case Lipschitz bounds capture, we ask whether exploiting this additional structure can yield improved utility. We study the problem of generating $(\varepsilon,δ)$-differentially private synthetic data from a dataset of size $n$ supported on the hypercube $[-1,1]^d$, with utility guarantees uniformly for all smooth queries having bounded derivatives up to order $k$. We propose a polynomial-time algorithm that achieves a minimax error rate of $n^{-\min \{1, \frac{k}{d}\}}$, up to a $\log(n)$ factor. This characterization uncovers a phase transition at $k=d$. Our results generalize the Chebyshev moment matching framework of (Musco et al., 2025; Wang et al., 2016) and strictly improve the error rates for $k$-smooth queries established in (Wang et al., 2016). Moreover, we establish the first minimax lower bound for the utility of $(\varepsilon,δ)$-differentially private synthetic data with respect to $k$-smooth queries, extending the Wasserstein lower bound for $\varepsilon$-differential privacy in (Boedihardjo et al., 2024).

翻译：差分隐私合成数据能够在为个体贡献者提供严格隐私保障的同时，实现敏感数据集的共享与分析。一个核心挑战在于为有意义的后续分析实现强有力的效用保证。现有许多方法确保了对广泛查询类别（例如所有Lipschitz函数）的一致准确性，但这种普适性水平常导致对实际关注统计量的次优收敛速率。由于许多常见数据分析查询展现出超越最坏情况Lipschitz界所能捕获的光滑性，我们探究利用这种额外结构能否带来效用提升。我们研究从超立方体$[-1,1]^d$上规模为$n$的数据集生成$(\varepsilon,\delta)$-差分隐私合成数据的问题，并要求对所有具有$k$阶有界导数的光滑查询保持一致的效用保证。我们提出一种多项式时间算法，实现了$n^{-\min \{1, \frac{k}{d}\}}$的极小极大误差率（忽略$\log(n)$因子）。这一特征揭示了在$k=d$处存在的相变现象。我们的结果推广了(Musco et al., 2025; Wang et al., 2016)的切比雪夫矩匹配框架，并严格改进了(Wang et al., 2016)中针对$k$-光滑查询建立的误差率。此外，我们首次建立了关于$k$-光滑查询的$(\varepsilon,\delta)$-差分隐私合成数据效用的极小极大下界，这扩展了(Boedihardjo et al., 2024)中针对$\varepsilon$-差分隐私的Wasserstein下界。