Understanding how Differentially Private Generative Models Spend their Privacy Budget

Generative models trained with Differential Privacy (DP) are increasingly used to produce synthetic data while reducing privacy risks. Navigating their specific privacy-utility tradeoffs makes it challenging to determine which models would work best for specific settings/tasks. In this paper, we fill this gap in the context of tabular data by analyzing how DP generative models distribute privacy budgets across rows and columns, arguably the main source of utility degradation. We examine the main factors contributing to how privacy budgets are spent, including underlying modeling techniques, DP mechanisms, and data dimensionality. Our extensive evaluation of both graphical and deep generative models sheds light on the distinctive features that render them suitable for different settings and tasks. We show that graphical models distribute the privacy budget horizontally and thus cannot handle relatively wide datasets while the performance on the task they were optimized for monotonically increases with more data. Deep generative models spend their budget per iteration, so their behavior is less predictable with varying dataset dimensions but could perform better if trained on more features. Also, low levels of privacy ($\epsilon\geq100$) could help some models generalize, achieving better results than without applying DP.

翻译：使用差分隐私训练的生成模型越来越多地用于生成合成数据，同时降低隐私风险。由于它们特定的隐私-效用权衡，很难确定哪些模型最适合特定场景/任务。在本文中，我们通过分析差分隐私生成模型如何在行和列之间分配隐私预算（这可以说是效用下降的主要来源）来填补表格数据背景下的这一空白。我们探讨了影响隐私预算分配的主要因素，包括基础建模技术、差分隐私机制和数据维度。我们对图形模型和深度生成模型的广泛评估揭示了它们适用于不同场景和任务的独特特征。我们表明，图形模型水平分配隐私预算，因此无法处理相对较宽的数据集，而它们在优化任务上的性能随着数据的增加而单调提升。深度生成模型每次迭代花费其预算，因此它们的行为随着数据集维度的变化而较难预测，但如果基于更多特征进行训练，可能会表现更好。此外，低隐私水平（ε≥100）可能有助于某些模型泛化，达到比未应用差分隐私时更好的结果。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

专知会员服务

129+阅读 · 2020年11月20日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日