We propose Bayesian methods to assess the statistical disclosure risk of data released under zero-concentrated differential privacy, focusing on settings with a strong hierarchical structure and categorical variables with many levels. Risk assessment is performed by hypothesizing Bayesian intruders with various amounts of prior information and examining the distance between their posteriors and priors. We discuss applications of these risk assessment methods to differentially private data releases from the 2020 decennial census and perform simulation studies using public individual-level data from the 1940 decennial census. Among these studies, we examine how the data holder's choice of privacy parameter affects the disclosure risk and quantify the increase in risk when a hypothetical intruder incorporates substantial amounts of hierarchical information.
翻译:我们提出贝叶斯方法,用于评估在零集中差分隐私机制下发布数据的统计披露风险,重点关注具有强分层结构和多层级分类变量的场景。风险分析通过假设具有不同先验信息的贝叶斯攻击者,并比较其后验与先验分布间的距离来实现。我们讨论这些风险评估方法在2020年十年一度人口普查中差分隐私数据发布的应用,并利用1940年十年一度人口普查的公开个人层面数据进行模拟研究。在研究中,我们考察数据持有者选择的隐私参数如何影响披露风险,并量化当假设攻击者整合大量层次化信息时风险增加的程度。