Noisy marginals are a common form of confidentiality-protecting data release and are useful for many downstream tasks such as contingency table analysis, construction of Bayesian networks, and even synthetic data generation. Privacy mechanisms that provide unbiased noisy answers to linear queries (such as marginals) are known as matrix mechanisms. We propose ResidualPlanner, a matrix mechanism for marginals with Gaussian noise that is both optimal and scalable. ResidualPlanner can optimize for many loss functions that can be written as a convex function of marginal variances (prior work was restricted to just one predefined objective function). ResidualPlanner can optimize the accuracy of marginals in large scale settings in seconds, even when the previous state of the art (HDMM) runs out of memory. It even runs on datasets with 100 attributes in a couple of minutes. Furthermore ResidualPlanner can efficiently compute variance/covariance values for each marginal (prior methods quickly run out of memory, even for relatively small datasets).
翻译:带噪边际是一种常见的保密性数据发布形式,常用于列联表分析、贝叶斯网络构建甚至合成数据生成等下游任务。为线性查询(如边际查询)提供无偏带噪答案的隐私机制被称为矩阵机制。我们提出ResidualPlanner——一种面向带高斯噪声边际的矩阵机制,兼具最优性与可扩展性。ResidualPlanner可针对多种损失函数(可表示为边际方差的凸函数)进行优化,而先前工作仅限于单一预定义目标函数。在大型场景中,ResidualPlanner能在数秒内优化边际精度,即使先前最优方法(HDMM)已耗尽内存;即便面对包含100个属性的数据集,也仅需数分钟即可运行。此外,ResidualPlanner能高效计算每个边际的方差/协方差值(先前方法对即使较小数据集也会迅速耗尽内存)。