Synthetic data are an attractive concept to enable privacy in data sharing. A fundamental question is how similar the privacy-preserving synthetic data are compared to the true data. Using metric privacy, an effective generalization of differential privacy beyond the discrete setting, we raise the problem of characterizing the optimal privacy-accuracy tradeoff by the metric geometry of the underlying space. We provide a partial solution to this problem in terms of the "entropic scale", a quantity that captures the multiscale geometry of a metric space via the behavior of its packing numbers. We illustrate the applicability of our privacy-accuracy tradeoff framework via a diverse set of examples of metric spaces.
翻译:合成数据是实现数据共享中隐私保护的一个吸引人的概念。一个基本问题是,隐私保护的合成数据与真实数据相比有多相似。利用度量隐私(差分隐私在离散设置之外的一种有效推广),我们提出了通过基础空间的度量几何来刻画最优隐私-准确性权衡问题。我们基于“熵尺度”这一概念给出了该问题的部分解,该量通过度量空间的填充数行为捕捉其多尺度几何特性。通过一系列度量空间的不同示例,我们展示了隐私-准确性权衡框架的适用性。