Star-join query is the fundamental task in data warehouse and has wide applications in On-line Analytical Processing (OLAP) scenarios. Due to the large number of foreign key constraints and the asymmetric effect in the neighboring instance between the fact and dimension tables, even those latest DP efforts specifically designed for join, if directly applied to star-join query, will suffer from extremely large estimation errors and expensive computational cost. In this paper, we are thus motivated to propose DP-starJ, a novel Differentially Private framework for star-Join queries. DP-starJ consists of a series of strategies tailored to specific features of star-join, including 1) we unveil the different effect of fact and dimension tables on the neighboring database instances, and accordingly revisit the definitions tailored to different cases of star-join; 2) we propose Predicate Mechanism (PM), which utilizes predicate perturbation to inject noise into the join procedure instead of the results; 3) to further boost the robust performance, we propose a DP-compliant star-join algorithm for various types of star-join tasks based on PM. We provide both theoretical analysis and empirical study, which demonstrate the superiority of the proposed methods over the state-of-the-art solutions in terms of accuracy, efficiency, and scalability.
翻译:星型连接查询是数据仓库中的基础任务,广泛应用于联机分析处理(OLAP)场景。由于事实表与维度表之间存在大量外键约束以及邻近实例间的非对称影响,即便最新的专为连接查询设计的差分隐私方案,若直接应用于星型连接查询,也会产生极大的估计误差和昂贵的计算开销。为此,本文提出DP-starJ——一种面向星型连接查询的新型差分隐私框架。DP-starJ包含一系列针对星型连接特性定制的策略,包括:1)揭示事实表与维度表对邻近数据库实例的不同影响,并据此重新定义适用于不同星型连接场景的隐私定义;2)提出谓词机制(PM),利用谓词扰动将噪声注入连接过程而非结果;3)为进一步提升鲁棒性能,基于PM提出一种符合差分隐私约束的星型连接算法,适用于各类星型连接任务。我们提供了理论分析与实验研究,结果表明所提方法在精度、效率与可扩展性上均优于现有最优方案。