Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex - but very common - machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates a private measure from a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data for general compact metric spaces. A key ingredient in our construction is a new superregular random walk, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmicaly slowly.
翻译:差分隐私是一种提供信息论安全保障的数学概念。尽管差分隐私已成为数据共享中保障隐私的事实标准,但已知的实现机制存在一些严重局限性。其效用保障通常仅针对事先固定指定的查询集提供,而对于聚类或分类等更复杂但极为常见的机器学习任务,则缺乏效用保障。本文旨在克服其中部分局限性。基于度量隐私(差分隐私的一种强大推广形式),我们提出了一种多项式时间算法,可从数据集中生成私有测度。该私有测度使我们能够高效构建适用于多种统计分析工具的精确私有合成数据。此外,对于一般紧致度量空间上的私有测度与合成数据,我们证明了渐近精确的极小极大结果。我们构建的关键要素是一种新型超正则随机游走,其步长联合分布与独立随机变量同样正则,但偏离原点的速度呈对数级缓慢增长。