Noisy marginals are a common form of confidentiality-protecting data release and are useful for many downstream tasks such as contingency table analysis, construction of Bayesian networks, and even synthetic data generation. Privacy mechanisms that provide unbiased noisy answers to linear queries (such as marginals) are known as matrix mechanisms. We propose ResidualPlanner, a matrix mechanism for marginals with Gaussian noise that is both optimal and scalable. ResidualPlanner can optimize for many loss functions that can be written as a convex function of marginal variances (prior work was restricted to just one predefined objective function). ResidualPlanner can optimize the accuracy of marginals in large scale settings in seconds, even when the previous state of the art (HDMM) runs out of memory. It even runs on datasets with 100 attributes in a couple of minutes. Furthermore ResidualPlanner can efficiently compute variance/covariance values for each marginal (prior methods quickly run out of memory, even for relatively small datasets).
翻译:噪声边际是一种常见的隐私保护数据发布形式,可广泛应用于列联表分析、贝叶斯网络构建乃至合成数据生成等下游任务。为线性查询(如边际查询)提供无偏噪声应答的隐私机制被称为矩阵机制。我们提出了ResidualPlanner——一种面向高斯噪声边际的矩阵机制,兼具最优性与可扩展性。ResidualPlanner可针对众多可表示为边际方差凸函数的损失函数进行优化(此前工作仅局限于单一预定义目标函数)。在大规模场景下,ResidualPlanner能在数秒内完成边际精度优化,即便此前最优方案(HDMM)已因内存耗尽而失效。该机制甚至可在数分钟内处理包含100个属性的数据集。此外,ResidualPlanner可高效计算每个边际的方差/协方差值(此前方法即使在较小数据集上也可能迅速耗尽内存)。