Randomized controlled trials (RCTs) have become powerful tools for assessing the impact of interventions and policies in many contexts. They are considered the gold standard for causal inference in the biomedical fields and many social sciences. Researchers have published an increasing number of studies that rely on RCTs for at least part of their inference. These studies typically include the response data that has been collected, de-identified, and sometimes protected through traditional disclosure limitation methods. In this paper, we empirically assess the impact of privacy-preserving synthetic data generation methodologies on published RCT analyses by leveraging available replication packages (research compendia) in economics and policy analysis. We implement three privacy-preserving algorithms, that use as a base one of the basic differentially private (DP) algorithms, the perturbed histogram, to support the quality of statistical inference. We highlight challenges with the straight use of this algorithm and the stability-based histogram in our setting and described the adjustments needed. We provide simulation studies and demonstrate that we can replicate the analysis in a published economics article on privacy-protected data under various parameterizations. We find that relatively straightforward (at a high-level) privacy-preserving methods influenced by DP techniques allow for inference-valid protection of published data. The results have applicability to researchers wishing to share RCT data, especially in the context of low- and middle-income countries, with strong privacy protection.
翻译:随机对照试验(RCTs)已成为评估多种情境下干预措施与政策影响力的有力工具,在生物医学领域及众多社会科学中被视为因果推断的黄金标准。研究人员发表的依赖随机对照试验进行至少部分推断的研究数量日益增长。这些研究通常包含经收集、去标识化处理,有时通过传统披露限制方法进行保护的响应数据。本文通过利用经济学与政策分析领域中可用的复制包(研究资料集),实证评估了隐私保护合成数据生成方法对已发表随机对照试验分析的影响。我们实现了三种隐私保护算法,这些算法以基础差分隐私(DP)算法之一——扰动直方图——作为基础,以支持统计推断的质量。我们重点指出了直接使用该算法及基于稳定性的直方图在本研究场景中面临的挑战,并阐述了所需的调整方法。我们通过模拟研究证明,能够在不同参数设置下,基于隐私保护数据复现一篇已发表经济学论文中的分析。研究发现,受差分隐私技术影响的相对简洁(在高层面上)的隐私保护方法,能够为已发表数据提供保持推断有效性的保护。该结果对于希望分享随机对照试验数据的研究者具有适用价值,特别是在需要强隐私保护的低收入与中等收入国家情境中。