Differential privacy (DP) enables private data analysis. In a typical DP deployment, controllers manage individuals' sensitive data and are responsible for answering analysts' queries while protecting individuals' privacy. They do so by choosing the privacy parameter $ε$, which controls the degree of privacy for all individuals in all possible datasets. However, it is challenging for controllers to choose $ε$ because of the difficulty of interpreting the privacy implications of such a choice on the within-dataset individuals. To address this challenge, we first derive a relative disclosure risk indicator (RDR) that indicates the impact of choosing $ε$ on the within-dataset individuals' disclosure risk. We then design an algorithm to find $ε$ based on controllers' privacy preferences expressed as a function of the within-dataset individuals' RDRs, and an alternative algorithm that finds and releases $ε$ while satisfying DP. Lastly, we propose a solution that bounds the total privacy leakage when using the algorithm to answer multiple queries without requiring controllers to set the total privacy budget. We evaluate our contributions through an IRB-approved user study that shows the RDR is useful for helping controllers choose $ε$, and experimental evaluations showing our algorithms are efficient and scalable.
翻译:差分隐私(DP)实现了私人数据分析。在典型的DP部署中,控制者管理个人的敏感数据,并负责在保护个人隐私的同时回答分析师的查询。他们通过选择隐私参数 ε 来实现这一点,该参数控制所有可能数据集中所有个体的隐私程度。然而,由于难以解释该选择对样本内个体的隐私影响,控制者很难选择 ε。为解决这一挑战,我们首先推导出一个相对披露风险指标(RDR),该指标表明选择 ε 对样本内个体披露风险的影响。然后,我们设计了一种算法,基于控制者以样本内个体RDR函数形式表达的隐私偏好来寻找 ε,以及另一种替代算法,该算法在满足DP的同时寻找并发布 ε。最后,我们提出了一种解决方案,在使用该算法回答多个查询时限制总隐私泄漏,而无需控制者设置总隐私预算。我们通过一项IRB批准的用户研究评估了我们的贡献,该研究表明RDR有助于控制者选择 ε,而实验评估表明我们的算法高效且可扩展。