Star-join query is the fundamental task in data warehouse and has wide applications in On-line Analytical Processing (OLAP) scenarios. Due to the large number of foreign key constraints and the asymmetric effect in the neighboring instance between the fact and dimension tables, even those latest DP efforts specifically designed for join, if directly applied to star-join query, will suffer from extremely large estimation errors and expensive computational cost. In this paper, we are thus motivated to propose DP-starJ, a novel Differentially Private framework for star-Join queries. DP-starJ consists of a series of strategies tailored to specific features of star-join, including 1) we unveil the different effect of fact and dimension tables on the neighboring database instances, and accordingly revisit the definitions tailored to different cases of star-join; 2) we propose Predicate Mechanism (PM), which utilizes predicate perturbation to inject noise into the join procedure instead of the results; 3) to further boost the robust performance, we propose a DP-compliant star-join algorithm for various types of star-join tasks based on PM. We provide both theoretical analysis and empirical study, which demonstrate the superiority of the proposed methods over the state-of-the-art solutions in terms of accuracy, efficiency, and scalability.
翻译:星型连接查询是数据仓库中的基础任务,在联机分析处理场景中具有广泛应用。由于事实表与维度表之间存在大量外键约束及邻接实例的非对称效应,即使最新专门为连接操作设计的差分隐私方案,若直接应用于星型连接查询,也会面临极大的估计误差和高昂的计算开销。为此,本文提出DP-starJ——一种新颖的面向星型连接查询的差分隐私框架。DP-starJ包含一系列针对星型连接特性定制的策略:1)揭示事实表与维度表对邻接数据库实例的不同影响,据此重新定义适配星型连接不同场景的差分隐私定义;2)提出谓词机制,通过谓词扰动将噪声注入连接过程而非查询结果;3)为提升鲁棒性能,基于谓词机制提出符合差分隐私要求的星型连接算法,可处理多种星型连接任务。通过理论分析与实验验证,本文方法在准确性、效率和可扩展性上均优于当前最优方案。