Combining observational and experimental data for causal inference can improve treatment effect estimation. However, many observational data sets cannot be released due to data privacy considerations, so one researcher may not have access to both experimental and observational data. Nonetheless, a small amount of risk of disclosing sensitive information might be tolerable to organizations that house confidential data. In these cases, organizations can employ data privacy techniques, which decrease disclosure risk, potentially at the expense of data utility. In this paper, we explore disclosure limiting transformations of observational data, which can be combined with experimental data to estimate the sample and population average treatment effects. We consider leveraging observational data to improve generalizability of treatment effect estimates when a randomized experiment (RCT) is not representative of the population of interest, and to increase precision of treatment effect estimates. Through simulation studies, we illustrate the trade-off between privacy and utility when employing different disclosure limiting transformations. We find that leveraging transformed observational data in treatment effect estimation can still improve estimation over only using data from an RCT.
翻译:将观测数据与实验数据相结合进行因果推断可改善治疗效果估计。然而,许多观测数据集因数据隐私考量无法公开,因此研究人员可能无法同时获取实验数据与观测数据。尽管如此,存储机密数据的机构可能容忍少量敏感信息泄露风险。在此类情况下,机构可采用数据隐私技术,该技术可降低泄露风险,但可能以牺牲数据效用为代价。本文探究了观测数据的披露限制转换方法,此类转换后的数据可与实验数据结合用于估计样本及总体平均治疗效果。当随机实验(RCT)无法代表目标人群时,我们考虑利用观测数据提升治疗效果估计的普适性,并增强估计精度。通过模拟研究,我们揭示了采用不同披露限制转换方法时隐私与效用间的权衡关系。研究发现,在治疗效果估计中利用经转换的观测数据,其估计效果仍优于仅使用随机实验数据。