Estimating causal effects from randomized experiments is only feasible if participants agree to reveal their potentially sensitive responses. Of the many ways of ensuring privacy, label differential privacy is a widely used measure of an algorithm's privacy guarantee, which might encourage participants to share responses without running the risk of de-anonymization. Many differentially private mechanisms inject noise into the original data-set to achieve this privacy guarantee, which increases the variance of most statistical estimators and makes the precise measurement of causal effects difficult: there exists a fundamental privacy-variance trade-off to performing causal analyses from differentially private data. With the aim of achieving lower variance for stronger privacy guarantees, we suggest a new differential privacy mechanism, "Cluster-DP", which leverages any given cluster structure of the data while still allowing for the estimation of causal effects. We show that, depending on an intuitive measure of cluster quality, we can improve the variance loss while maintaining our privacy guarantees. We compare its performance, theoretically and empirically, to that of its unclustered version and a more extreme uniform-prior version which does not use any of the original response distribution, both of which are special cases of the "Cluster-DP" algorithm.
翻译:从随机实验中估计因果效应,只有在参与者同意披露其潜在敏感响应时才能实现。在众多确保隐私的方法中,标签差分隐私是算法隐私保证的一种广泛使用的度量,它可能鼓励参与者分享响应,而无需承担去匿名化的风险。许多差分隐私机制通过向原始数据集中注入噪声来实现这种隐私保证,但这会增加大多数统计估计量的方差,使得因果效应的精确测量变得困难:从差分隐私数据中进行因果分析时,存在一种基本的隐私-方差权衡。为了实现更低的方差和更强的隐私保证,我们提出了一种新的差分隐私机制——"Cluster-DP",该机制利用数据的任何给定聚类结构,同时仍允许估计因果效应。我们证明,根据一种直观的聚类质量度量,我们可以在保持隐私保证的前提下改善方差的损失。我们从理论和实验上,将其性能与未聚类的版本以及一种更极端的、不使用任何原始响应分布的均匀先验版本进行了比较,这两种版本都是"Cluster-DP"算法的特例。