Given a dataset of $n$ i.i.d. samples from an unknown distribution $P$, we consider the problem of generating a sample from a distribution that is close to $P$ in total variation distance, under the constraint of differential privacy (DP). We study the problem when $P$ is a multi-dimensional Gaussian distribution, under different assumptions on the information available to the DP mechanism: known covariance, unknown bounded covariance, and unknown unbounded covariance. We present new DP sampling algorithms, and show that they achieve near-optimal sample complexity in the first two settings. Moreover, when $P$ is a product distribution on the binary hypercube, we obtain a pure-DP algorithm whereas only an approximate-DP algorithm (with slightly worse sample complexity) was previously known.
翻译:给定一个包含来自未知分布 $P$ 的 $n$ 个独立同分布样本的数据集,我们考虑在差分隐私(DP)约束下,生成一个在总变差距离上接近 $P$ 的分布样本的问题。我们研究了当 $P$ 为多维高斯分布时的问题,针对差分隐私机制可获取的不同信息假设:已知协方差、未知有界协方差以及未知无界协方差。我们提出了新的差分隐私采样算法,并证明在前两种设定下,这些算法能够实现近乎最优的样本复杂度。此外,当 $P$ 为二元超立方体上的乘积分布时,我们得到了一种纯差分隐私算法,而此前仅已知一种近似差分隐私算法(其样本复杂度略差)。