We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well when potentially extreme points are added to $D$; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and $\ell_p$-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.
翻译:本文为差分隐私估计算法提出了一种新的实例最优性定义。我们的定义要求最优算法能够在每个数据集$D$上,同时与满足以下条件的基准隐私算法竞争:(a) 该基准算法预先知道$D$;(b) 其性能评估基于其在$D$的大规模子集上的最坏情况表现。换言之,基准算法在可能向$D$添加极端数据点时无需保持良好性能,仅需处理少量已存在真实数据点的移除情况。这使得我们的基准显著强于先前工作中提出的基准。尽管如此,对于实值数据集,我们展示了如何构建隐私算法,使其在估计包括均值、分位数和$\ell_p$范数最小化器在内的广泛数据集属性时,达到我们所定义的实例最优性。特别针对均值估计,我们提供了详细分析,并证明在多种分布假设下,我们的算法能同时达到或超越现有算法的渐近性能。