In recent years, differential privacy has emerged as the de facto standard for sharing statistics of datasets while limiting the disclosure of private information about the involved individuals. This is achieved by randomly perturbing the statistics to be published, which in turn leads to a privacy-accuracy trade-off: larger perturbations provide stronger privacy guarantees, but they result in less accurate statistics that offer lower utility to the recipients. Of particular interest are therefore optimal mechanisms that provide the highest accuracy for a pre-selected level of privacy. To date, work in this area has focused on specifying families of perturbations a priori and subsequently proving their asymptotic and/or best-in-class optimality. In this paper, we develop a class of mechanisms that enjoy non-asymptotic and unconditional optimality guarantees. To this end, we formulate the mechanism design problem as an infinite-dimensional distributionally robust optimization problem. We show that the problem affords a strong dual, and we exploit this duality to develop converging hierarchies of finite-dimensional upper and lower bounding problems. Our upper (primal) bounds correspond to implementable perturbations whose suboptimality can be bounded by our lower (dual) bounds. Both bounding problems can be solved within seconds via cutting plane techniques that exploit the inherent problem structure. Our numerical experiments demonstrate that our perturbations can outperform the previously best results from the literature on artificial as well as standard benchmark problems.
翻译:近年来,差分隐私已成为在限制涉及个体隐私信息泄露的同时共享数据集统计量的实际标准。这通过随机扰动待发布的统计量来实现,进而导致隐私-准确性的权衡:更强的扰动提供更可靠的隐私保障,但会导致统计量准确性降低,从而对接收者的效用产生负面影响。因此,对于预先选定的隐私保护级别能提供最高准确性的最优机制具有特殊研究价值。迄今为止,该领域的研究主要集中于先验地设定扰动族,随后证明其渐近和/或同类最优性。本文提出一类具有非渐近且无条件最优性保证的机制。为此,我们将机制设计问题表述为无限维分布鲁棒优化问题。我们证明该问题具有强对偶性,并利用这一对偶性构建了有限维上下界问题的收敛层级。我们的上界(原问题)对应可实现的扰动方案,其次优性可通过下界(对偶问题)进行量化。两个边界问题均可通过利用问题内在结构的割平面技术在数秒内求解。数值实验表明,在人工构造问题及标准基准问题上,我们提出的扰动方案均优于文献中已有的最佳结果。