In recent years, differential privacy has emerged as the de facto standard for sharing statistics of datasets while limiting the disclosure of private information about the involved individuals. This is achieved by randomly perturbing the statistics to be published, which in turn leads to a privacy-accuracy trade-off: larger perturbations provide stronger privacy guarantees, but they result in less accurate statistics that offer lower utility to the recipients. Of particular interest are therefore optimal mechanisms that provide the highest accuracy for a pre-selected level of privacy. To date, work in this area has focused on specifying families of perturbations a priori and subsequently proving their asymptotic and/or best-in-class optimality. In this paper, we develop a class of mechanisms that enjoy non-asymptotic and unconditional optimality guarantees. To this end, we formulate the mechanism design problem as an infinite-dimensional distributionally robust optimization problem. We show that the problem affords a strong dual, and we exploit this duality to develop converging hierarchies of finite-dimensional upper and lower bounding problems. Our upper (primal) bounds correspond to implementable perturbations whose suboptimality can be bounded by our lower (dual) bounds. Both bounding problems can be solved within seconds via cutting plane techniques that exploit the inherent problem structure. Our numerical experiments demonstrate that our perturbations can outperform the previously best results from the literature on artificial as well as standard benchmark problems.
翻译:近年来,差分隐私已成为在限制涉及个体隐私信息泄露的同时共享数据集统计结果的行业标准。该目标通过随机扰动待发布的统计量实现,但会引发隐私-准确性权衡:更强的扰动提供更严格的隐私保障,但会导致统计精度降低,从而削弱接收方的效用。因此,针对预设隐私等级提供最高准确度的最优机制备受关注。迄今为止,该领域的研究集中于先验指定扰动族,随后证明其渐近最优性和/或同类最优性。本文提出一类具有非渐近且无条件最优性保证的机制。为此,我们将机制设计问题构建为无限维分布鲁棒优化问题。研究表明该问题具有强对偶性,利用这一对偶性质,我们开发出收敛的有限维上下界问题层次结构。上界(原始)问题对应可实现的扰动,其次优性可通过下界(对偶)问题限定。两类边界问题均可利用问题内在结构的切割平面技术在数秒内求解。数值实验表明,我们的扰动在人工数据及标准基准问题上均能超越文献中此前最优结果。