DPImageBench: A Unified Benchmark for Differentially Private Image Synthesis

Differentially private (DP) image synthesis aims to generate artificial images that retain the properties of sensitive images while protecting the privacy of individual images within the dataset. Despite recent advancements, we find that inconsistent--and sometimes flawed--evaluation protocols have been applied across studies. This not only impedes the understanding of current methods but also hinders future advancements. To address the issue, this paper introduces DPImageBench for DP image synthesis, with thoughtful design across several dimensions: (1) Methods. We study eleven prominent methods and systematically characterize each based on model architecture, pretraining strategy, and privacy mechanism. (2) Evaluation. We include nine datasets and seven fidelity and utility metrics to thoroughly assess them. Notably, we find that a common practice of selecting downstream classifiers based on the highest accuracy on the sensitive test set not only violates DP but also overestimates the utility scores. DPImageBench corrects for these mistakes. (3) Platform. Despite the methods and evaluation protocols, DPImageBench provides a standardized interface that accommodates current and future implementations within a unified framework. With DPImageBench, we have several noteworthy findings. For example, contrary to the common wisdom that pretraining on public image datasets is usually beneficial, we find that the distributional similarity between pretraining and sensitive images significantly impacts the performance of the synthetic images and does not always yield improvements. In addition, adding noise to low-dimensional features, such as the high-level characteristics of sensitive images, is less affected by the privacy budget compared to adding noise to high-dimensional features, like weight gradients. The former methods perform better than the latter under a low privacy budget.

翻译：差分隐私（DP）图像合成旨在生成能够保留敏感图像特性，同时保护数据集中个体图像隐私的人工图像。尽管近期取得进展，我们发现不同研究采用了不一致——有时甚至存在缺陷——的评估方案。这不仅阻碍了对现有方法的理解，也制约了未来的发展。为解决这一问题，本文针对DP图像合成提出DPImageBench，在多个维度进行了周密设计：（1）方法层面。我们研究了十一种主流方法，并依据模型架构、预训练策略和隐私机制对每种方法进行了系统化表征。（2）评估体系。我们涵盖九个数据集和七项保真度与效用指标进行全面评估。值得注意的是，我们发现基于敏感测试集最高准确率选择下游分类器的常见做法不仅违反DP原则，还会高估效用分数。DPImageBench修正了这些错误。（3）平台架构。除方法与评估方案外，DPImageBench提供了标准化接口，可在统一框架内兼容当前及未来的实现方案。通过DPImageBench，我们获得了若干重要发现。例如，与“公共图像数据集预训练通常有益”的普遍认知相反，我们发现预训练图像与敏感图像之间的分布相似性显著影响合成图像性能，且并不总能带来改进。此外，相较于对权重梯度等高维特征添加噪声，对敏感图像高层特征等低维特征添加噪声受隐私预算影响更小。在低隐私预算条件下，前者表现优于后者。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日