We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well when potentially extreme points are added to $D$; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and $\ell_p$-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.
翻译:我们提出了一种用于差分隐私估计算法的实例最优性新定义。该定义要求最优算法能够针对每个数据集$D$,同时与一个事先(a)已知$D$且(b)基于$D$的大规模子集上的最坏情况性能进行评估的最优私有基准算法相竞争。换言之,该基准算法无需在向$D$添加潜在极端数据点时表现良好,只需处理少量真实数据点的移除。这使得我们的基准显著强于先前工作中提出的基准。尽管如此,我们证明,对于实值数据集,在估计广泛的数据集属性(包括均值、分位数和$\ell_p$范数最小化器)时,可以构造出达成我们定义的实例最优性的私有算法。特别地,针对均值估计,我们提供了详细分析,并证明我们的算法在多种分布假设下,同时匹配或超越现有算法的渐近性能。