This article establishes a method to answer a finite set of linear queries on a given dataset while ensuring differential privacy. To achieve this, we formulate the corresponding task as a saddle-point problem, i.e. an optimization problem whose solution corresponds to a distribution minimizing the difference between answers to the linear queries based on the true distribution and answers from a differentially private distribution. Against this background, we establish two new algorithms for corresponding differentially private data release: the first is based on the differentially private Frank-Wolfe method, the second combines randomized smoothing with stochastic convex optimization techniques for a solution to the saddle-point problem. While previous works assess the accuracy of differentially private algorithms with reference to the empirical data distribution, a key contribution of our work is a more natural evaluation of the proposed algorithms' accuracy with reference to the true data-generating distribution.
翻译:本文提出了一种在保证差分隐私的前提下,回答给定数据集上有限线性查询的方法。为实现这一目标,我们将相应任务表述为一个鞍点问题,即通过优化求解得到一个分布,该分布能最小化基于真实分布的线性查询答案与差分隐私分布答案之间的差异。基于此框架,我们建立了两种新的差分隐私数据发布算法:第一种基于差分隐私Frank-Wolfe方法,第二种将随机平滑技术与随机凸优化技术相结合以求解鞍点问题。现有研究多基于经验数据分布评估差分隐私算法的准确性,而本工作的核心贡献在于提出了一种更自然的评估框架——基于真实数据生成分布来评估所提算法的准确性。