Minimax optimal differentially private synthetic data for smooth queries

Differentially private synthetic data enables the sharing and analysis of sensitive datasets while providing rigorous privacy guarantees for individual contributors. A central challenge is to achieve strong utility guarantees for meaningful downstream analysis. Many existing methods ensure uniform accuracy over broad query classes, such as all Lipschitz functions, but this level of generality often leads to suboptimal rates for statistics of practical interest. Since many common data analysis queries exhibit smoothness beyond what worst-case Lipschitz bounds capture, we ask whether exploiting this additional structure can yield improved utility. We study the problem of generating $(\varepsilon,δ)$-differentially private synthetic data from a dataset of size $n$ supported on the hypercube $[-1,1]^d$, with utility guarantees uniformly for all smooth queries having bounded derivatives up to order $k$. We propose a polynomial-time algorithm that achieves a minimax error rate of $n^{-\min \{1, \frac{k}{d}\}}$, up to a $\log(n)$ factor. This characterization uncovers a phase transition at $k=d$. Our results generalize the Chebyshev moment matching framework of (Musco et al., 2025; Wang et al., 2016) and strictly improve the error rates for $k$-smooth queries established in (Wang et al., 2016). Moreover, we establish the first minimax lower bound for the utility of $(\varepsilon,δ)$-differentially private synthetic data with respect to $k$-smooth queries, extending the Wasserstein lower bound for $\varepsilon$-differential privacy in (Boedihardjo et al., 2024).

翻译：差分隐私合成数据能够在为个体贡献者提供严格隐私保障的同时，实现对敏感数据集的共享与分析。其核心挑战在于为有意义的后续分析提供强有力的效用保证。现有许多方法确保了在广泛查询类别（例如所有Lipschitz函数）上的一致准确性，但这种普适性水平往往导致对实际关注的统计量产生次优的收敛速率。由于许多常见的数据分析查询展现出超出最坏情况Lipschitz界所能捕捉的光滑性，我们探讨利用这种额外结构是否能够带来效用提升。我们研究从支撑在超立方体$[-1,1]^d$上、规模为$n$的数据集生成$(\varepsilon,\delta)$-差分隐私合成数据的问题，并要求对所有具有直至$k$阶有界导数的光滑查询提供一致的效用保证。我们提出一种多项式时间算法，能够实现$n^{-\min \{1, \frac{k}{d}\}}$的极小极大误差率（忽略一个$\log(n)$因子）。这一特征揭示了在$k=d$处存在一个相变。我们的结果推广了(Musco et al., 2025; Wang et al., 2016)的切比雪夫矩匹配框架，并严格改进了(Wang et al., 2016)中为$k$-光滑查询建立的误差率。此外，我们首次建立了关于$k$-光滑查询的$(\varepsilon,\delta)$-差分隐私合成数据效用的极小极大下界，这扩展了(Boedihardjo et al., 2024)中针对$\varepsilon$-差分隐私的Wasserstein下界。