We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well when potentially extreme points are added to $D$; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and $\ell_p$-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.
翻译:我们提出了一种用于差分隐私估计算法的实例最优性新定义。该定义要求最优算法在每一数据集$D$上,同时与已知$D$且仅需在$D$的大子集上保证最差情况性能的最佳私有基准算法竞争。具体而言,基准算法无需在向$D$添加潜在极端数据点时表现良好,仅需应对移除少量现有真实数据点的情况。这使得我们的基准显著强于先前工作中的相关定义。尽管如此,我们证明了对于实值数据集,在估计均值、分位数及$\ell_p$范数最小化器等一系列广泛的数据集属性时,如何构造满足此实例最优性概念的私有算法。特别地,针对均值估计,我们提供了详细分析,并表明所提算法在一系列分布假设下能够同时匹配或超越现有算法的渐近性能。