Causal Graph Discovery (CGD) is the process of estimating the underlying probabilistic graphical model that represents joint distribution of features of a dataset. CGD-algorithms are broadly classified into two categories: (i) Constraint-based algorithms (outcome depends on conditional independence (CI) tests), (ii) Score-based algorithms (outcome depends on optimized score-function). Since, sensitive features of observational data is prone to privacy-leakage, Differential Privacy (DP) has been adopted to ensure user privacy in CGD. Adding same amount of noise in this sequential-natured estimation process affects the predictive performance of the algorithms. As initial CI tests in constraint-based algorithms and later iterations of the optimization process of score-based algorithms are crucial, they need to be more accurate, less noisy. Based on this key observation, we present CURATE (CaUsal gRaph AdapTivE privacy), a DP-CGD framework with adaptive privacy budgeting. In contrast to existing DP-CGD algorithms with uniform privacy budgeting across all iterations, CURATE allows adaptive privacy budgeting by minimizing error probability (for constraint-based), maximizing iterations of the optimization problem (for score-based) while keeping the cumulative leakage bounded. To validate our framework, we present a comprehensive set of experiments on several datasets and show that CURATE achieves higher utility compared to existing DP-CGD algorithms with less privacy-leakage.
翻译:因果图发现(CGD)是估计表示数据集特征联合分布的底层概率图模型的过程。CGD算法大致分为两类:(i)基于约束的算法(结果取决于条件独立性检验),(ii)基于评分的算法(结果取决于优化的评分函数)。由于观测数据的敏感特征易受隐私泄露影响,差分隐私(DP)已被引入CGD以确保用户隐私。在这种序列化估计过程中添加等量噪声会影响算法的预测性能。由于基于约束算法的初始条件独立性检验和基于评分算法优化过程的后期迭代至关重要,它们需要更精确且噪声更少。基于这一关键观察,我们提出了CURATE(因果图自适应隐私),一种具有自适应隐私预算的DP-CGD框架。与现有在所有迭代中采用均匀隐私预算的DP-CGD算法不同,CURATE通过最小化错误概率(针对基于约束算法)、最大化优化问题迭代次数(针对基于评分算法),同时保持累积泄露有界,实现自适应隐私预算分配。为验证框架有效性,我们在多个数据集上进行了全面实验,结果表明相较于现有DP-CGD算法,CURATE能以更少的隐私泄露实现更高的效用。