Private synthetic data sharing is preferred as it keeps the distribution and nuances of original data compared to summary statistics. The state-of-the-art methods adopt a select-measure-generate paradigm, but measuring large domain marginals still results in much error and allocating privacy budget iteratively is still difficult. To address these issues, our method employs a partition-based approach that effectively reduces errors and improves the quality of synthetic data, even with a limited privacy budget. Results from our experiments demonstrate the superiority of our method over existing approaches. The synthetic data produced using our approach exhibits improved quality and utility, making it a preferable choice for private synthetic data sharing.
翻译:私有合成数据共享因其能保留原始数据的分布和细微差别而优于汇总统计。最先进的方法采用“选择-测量-生成”范式,但测量大域边际仍导致较大误差,且迭代分配隐私预算依然困难。为解决这些问题,我们的方法采用基于分区的策略,即使在隐私预算有限的情况下也能有效减少误差并提升合成数据质量。实验结果证明了我们的方法相对于现有方法的优越性。使用我们的方法生成的合成数据在质量和实用性上均有提升,使其成为私有合成数据共享的更优选择。