Metric Differential Privacy (mDP) extends the concept of Differential Privacy (DP) to serve as a new paradigm of data perturbation. It is designed to protect secret data represented in general metric space, such as text data encoded as word embeddings or geo-location data on the road network or grid maps. To derive an optimal data perturbation mechanism under mDP, a widely used method is linear programming (LP), which, however, might suffer from a polynomial explosion of decision variables, rendering it impractical in large-scale mDP. In this paper, our objective is to develop a new computation framework to enhance the scalability of the LP-based mDP. Considering the connections established by the mDP constraints among the secret records, we partition the original secret dataset into various subsets. Building upon the partition, we reformulate the LP problem for mDP and solve it via Benders Decomposition, which is composed of two stages: (1) a master program to manage the perturbation calculation across subsets and (2) a set of subproblems, each managing the perturbation derivation within a subset. Our experimental results on multiple datasets, including geo-location data in the road network/grid maps, text data, and synthetic data, underscore our proposed mechanism's superior scalability and efficiency.
翻译:度量差分隐私(Metric Differential Privacy, mDP)将差分隐私(Differential Privacy, DP)概念扩展为一种新的数据扰动范式,旨在保护以通用度量空间表示的秘密数据,例如编码为词嵌入的文本数据或基于道路网络/网格地图的地理位置数据。在mDP框架下,线性规划(Linear Programming, LP)是推导最优数据扰动机制的常用方法,然而该方法可能面临决策变量数量的多项式爆炸问题,导致其在大规模mDP场景中难以实际应用。本文旨在开发一种新的计算框架以提升基于LP的mDP可扩展性。考虑到mDP约束在秘密记录之间建立的关联性,我们将原始秘密数据集划分为多个子集。基于此划分,我们重新构建mDP的LP问题,并通过Benders分解进行求解,该分解包含两个阶段:(1)主程序负责管理跨子集的扰动计算;(2)一组子问题各自管理子集内部的扰动推导。我们在多个数据集(包括道路网络/网格地图的地理位置数据、文本数据及合成数据)上的实验结果充分证明了所提机制在可扩展性和效率方面的显著优势。