This work studies the estimation of many statistical quantiles under differential privacy. More precisely, given a distribution and access to i.i.d. samples from it, we study the estimation of the inverse of its cumulative distribution function (the quantile function) at specific points. For instance, this task is of key importance in private data generation. We present two different approaches. The first one consists in privately estimating the empirical quantiles of the samples and using this result as an estimator of the quantiles of the distribution. In particular, we study the statistical properties of the recently published algorithm introduced by Kaplan et al. 2022 that privately estimates the quantiles recursively. The second approach is to use techniques of density estimation in order to uniformly estimate the quantile function on an interval. In particular, we show that there is a tradeoff between the two methods. When we want to estimate many quantiles, it is better to estimate the density rather than estimating the quantile function at specific points.
翻译:本文研究了在差分隐私条件下对多个统计分位数的估计问题。具体而言,给定一个分布及其独立同分布样本,我们研究在其特定点处估计累积分布函数(分位数函数)的反函数。例如,该任务在私有数据生成中具有关键重要性。我们提出了两种不同方法。第一种方法涉及私有估计样本的经验分位数,并将此结果作为分布分位数的估计量。特别地,我们研究了Kaplan等人(2022)最新提出的递归私有估计分位数算法的统计性质。第二种方法利用密度估计技术,在区间上均匀估计分位数函数。研究表明,这两种方法存在权衡:当需要估计多个分位数时,密度估计方法优于在特定点处直接估计分位数函数。