Matrix mechanisms are often used to provide unbiased differentially private query answers when publishing statistics or creating synthetic data. Recent work has developed matrix mechanisms, such as ResidualPlanner and Weighted Fourier Factorizations, that scale to high dimensional datasets while providing optimality guarantees for workloads such as marginals and circular product queries. They operate by adding noise to a linearly independent set of queries that can compactly represent the desired workloads. In this paper, we present QuerySmasher, an alternative scalable approach based on a divide-and-conquer strategy. Given a workload that can be answered from various data marginals, QuerySmasher splits each query into sub-queries and re-assembles the pieces into mutually orthogonal sub-workloads. These sub-workloads represent small, low-dimensional problems that can be independently and optimally answered by existing low-dimensional matrix mechanisms. QuerySmasher then stitches these solutions together to answer queries in the original workload. We show that QuerySmasher subsumes prior work, like ResidualPlanner (RP), ResidualPlanner+ (RP+), and Weighted Fourier Factorizations (WFF). We prove that it can dominate those approaches, under sum squared error, for all workloads. We also experimentally demonstrate the scalability and accuracy of QuerySmasher.
翻译:矩阵机制常用于在发布统计数据或创建合成数据时提供无偏的差分隐私查询答案。近期研究开发了诸如ResidualPlanner和Weighted Fourier Factorizations等矩阵机制,这些机制可扩展至高维数据集,并为边际分布及循环乘积查询等工作负载提供最优性保证。其运作原理是通过向一组线性无关的查询添加噪声,该查询集可紧凑表示所需工作负载。本文提出QuerySmasher——一种基于分治策略的替代性可扩展方法。对于可通过多种数据边际分布回答的工作负载,QuerySmasher将每个查询拆分为子查询,并将这些子查询重新组合为相互正交的子工作负载。这些子工作负载代表规模小、维度低的问题,可通过现有低维矩阵机制独立且最优地求解。随后QuerySmasher将这些解决方案拼接起来,回答原始工作负载中的查询。我们证明QuerySmasher涵盖了此前的研究成果,如ResidualPlanner (RP)、ResidualPlanner+ (RP+)及Weighted Fourier Factorizations (WFF)。在平方误差和指标下,我们证明其能在所有工作负载上优于这些方法。实验进一步验证了QuerySmasher的可扩展性与准确性。