Artificial intelligence and data access are already mainstream. One of the main challenges when designing an artificial intelligence or disclosing content from a database is preserving the privacy of individuals who participate in the process. Differential privacy for synthetic data generation has received much attention due to the ability of preserving privacy while freely using the synthetic data. Private sampling is the first noise-free method to construct differentially private synthetic data with rigorous bounds for privacy and accuracy. However, this synthetic data generation method comes with constraints which seem unrealistic and not applicable for real-world datasets. In this paper, we provide an implementation of the private sampling algorithm and discuss the realism of its constraints in practical cases.
翻译:人工智能和数据访问已成为主流。在设计人工智能系统或公开数据库内容时,主要挑战之一是保护参与该过程的个人隐私。具有差分隐私的合成数据生成因其在自由使用合成数据的同时保护隐私的能力而备受关注。私有采样是首个无噪声的差分隐私合成数据构建方法,能够提供严格的隐私和精度保障。然而,这种合成数据生成方法存在一些看似不切实际、难以应用于真实数据集的约束条件。本文实现了私有采样算法,并讨论了这些约束在实际场景中的现实性。