This work studies the estimation of many statistical quantiles under differential privacy. More precisely, given a distribution and access to i.i.d. samples from it, we study the estimation of the inverse of its cumulative distribution function (the quantile function) at specific points. For instance, this task is of key importance in private data generation. We present two different approaches. The first one consists in privately estimating the empirical quantiles of the samples and using this result as an estimator of the quantiles of the distribution. In particular, we study the statistical properties of the recently published algorithm introduced by Kaplan et al. 2022 that privately estimates the quantiles recursively. The second approach is to use techniques of density estimation in order to uniformly estimate the quantile function on an interval. In particular, we show that there is a tradeoff between the two methods. When we want to estimate many quantiles, it is better to estimate the density rather than estimating the quantile function at specific points.
翻译:本研究探讨在差分隐私约束下对多个统计分位数进行估计的问题。具体而言,给定一个分布及其独立同分布样本,我们研究在特定点处对该分布累积分布函数逆函数(即分位数函数)的估计。例如,该任务在私有数据生成中具有关键重要性。我们提出两种不同方法:第一种方法涉及私有地估计样本的经验分位数,并以此作为分布分位数的估计量。我们重点分析了Kaplan等人2022年近期发表的算法——该算法通过递归方式私有地估计分位数——的统计特性;第二种方法利用密度估计技术,在某个区间上均匀地估计分位数函数。研究表明,这两种方法存在权衡关系:当需要估计大量分位数时,采用密度估计方法优于在特定点处直接估计分位数函数。