Does Differentially Private Synthetic Data Lead to Synthetic Discoveries?

Background: Synthetic data has been proposed as a solution for sharing anonymized versions of sensitive biomedical datasets. Ideally, synthetic data should preserve the structure and statistical properties of the original data, while protecting the privacy of the individual subjects. Differential privacy (DP) is currently considered the gold standard approach for balancing this trade-off. Objectives: The aim of this study is to evaluate the Mann-Whitney U test on DP-synthetic biomedical data in terms of Type I and Type II errors, in order to establish whether statistical hypothesis testing performed on privacy preserving synthetic data is likely to lead to loss of test's validity or decreased power. Methods: We evaluate the Mann-Whitney U test on DP-synthetic data generated from real-world data, including a prostate cancer dataset (n=500) and a cardiovascular dataset (n=70 000), as well as on data drawn from two Gaussian distributions. Five different DP-synthetic data generation methods are evaluated, including two basic DP histogram release methods and MWEM, Private-PGM, and DP GAN algorithms. Conclusion: Most of the tested DP-synthetic data generation methods showed inflated Type I error, especially at privacy budget levels of $\epsilon\leq 1$. This result calls for caution when releasing and analyzing DP-synthetic data: low p-values may be obtained in statistical tests simply as a byproduct of the noise added to protect privacy. A DP smoothed histogram-based synthetic data generation method was shown to produce valid Type I error for all privacy levels tested but required a large original dataset size and a modest privacy budget ($\epsilon\geq 5$) in order to have reasonable Type II error levels.

翻译：背景：合成数据已被提出作为共享敏感生物医学数据集匿名化版本的解决方案。理想情况下，合成数据应保留原始数据的结构和统计特性，同时保护个体受试者的隐私。差分隐私目前被认为是平衡这一权衡的黄金标准方法。目标：本研究旨在评估曼-惠特尼U检验在差分隐私合成生物医学数据上的第一类错误和第二类错误，以确定对隐私保护合成数据进行统计假设检验是否可能导致检验有效性损失或统计功效降低。方法：我们基于真实世界数据（包括前列腺癌数据集（n=500）和心血管数据集（n=70,000））以及从两个高斯分布中抽取的数据，对差分隐私合成数据上的曼-惠特尼U检验进行了评估。共评估了五种不同的差分隐私合成数据生成方法，包括两种基础差分隐私直方图发布方法以及MWEM、Private-PGM和DP GAN算法。结论：大多数被测试的差分隐私合成数据生成方法显示出膨胀的第一类错误，尤其在隐私预算$\epsilon\leq 1$的情况下。这一结果提示在发布和分析差分隐私合成数据时需谨慎：统计检验中出现的低p值可能仅是保护隐私所添加噪声的副产品。基于差分隐私平滑直方图的合成数据生成方法在所有测试的隐私水平上均能产生有效的第一类错误，但为达到合理的第二类错误水平，需要较大的原始数据集规模和适度的隐私预算（$\epsilon\geq 5$）。