In this work we consider the problem of differentially private computation of quantiles for the data, especially the highest quantiles such as maximum, but with an unbounded range for the dataset. We show that this can be done efficiently through a simple invocation of $\texttt{AboveThreshold}$, a subroutine that is iteratively called in the fundamental Sparse Vector Technique, even when there is no upper bound on the data. In particular, we show that this procedure can give more accurate and robust estimates on the highest quantiles with applications towards clipping that is essential for differentially private sum and mean estimation. In addition, we show how two invocations can handle the fully unbounded data setting. Within our study, we show that an improved analysis of $\texttt{AboveThreshold}$ can improve the privacy guarantees for the widely used Sparse Vector Technique that is of independent interest. We give a more general characterization of privacy loss for $\texttt{AboveThreshold}$ which we immediately apply to our method for improved privacy guarantees. Our algorithm only requires one $O(n)$ pass through the data, which can be unsorted, and each subsequent query takes $O(1)$ time. We empirically compare our unbounded algorithm with the state-of-the-art algorithms in the bounded setting. For inner quantiles, we find that our method often performs better on non-synthetic datasets. For the maximal quantiles, which we apply to differentially private sum computation, we find that our method performs significantly better.
翻译:本文研究数据的分位数(尤其是最大值等高阶分位数)在无界数据范围下的差分隐私计算问题。我们证明,即使数据没有上界,通过简单调用基础稀疏向量技术中迭代使用的子程序$\texttt{AboveThreshold}$即可高效实现该目标。特别地,我们表明该方法能够对高阶分位数提供更准确且鲁棒的估计,并应用于差分隐私求和与均值估计中至关重要的截断操作。此外,我们进一步展示了如何通过两次调用来处理完全无界的数据场景。研究过程中,我们改进了对$\texttt{AboveThreshold}$的分析方法,从而提升了广泛使用的稀疏向量技术的隐私保证,这一改进具有独立的研究价值。我们给出了$\texttt{AboveThreshold}$的隐私损失更通用的刻画,并将其直接应用于我们提出的方法以获得更好的隐私保障。该算法仅需对数据进行单次$O(n)$遍历(数据无需排序),且每次后续查询仅需$O(1)$时间。我们在有界设置下将提出的无界算法与最先进算法进行了经验比较。对于内部分位数,我们发现该方法在非合成数据集上通常表现更优;而对于应用于差分隐私求和计算的最大分位数,我们的方法则展现出显著更好的性能。