PriSampler: Mitigating Property Inference of Diffusion Models

Diffusion models have been remarkably successful in data synthesis. However, when these models are applied to sensitive datasets, such as banking and human face data, they might bring up severe privacy concerns. This work systematically presents the first privacy study about property inference attacks against diffusion models, where adversaries aim to extract sensitive global properties of its training set from a diffusion model. Specifically, we focus on the most practical attack scenario: adversaries are restricted to accessing only synthetic data. Under this realistic scenario, we conduct a comprehensive evaluation of property inference attacks on various diffusion models trained on diverse data types, including tabular and image datasets. A broad range of evaluations reveals that diffusion models and their samplers are universally vulnerable to property inference attacks. In response, we propose a new model-agnostic plug-in method PriSampler to mitigate the risks of the property inference of diffusion models. PriSampler can be directly applied to well-trained diffusion models and support both stochastic and deterministic sampling. Extensive experiments illustrate the effectiveness of our defense, and it can lead adversaries to infer the proportion of properties as close as predefined values that model owners wish. Notably, PriSampler also shows its significantly superior performance to diffusion models trained with differential privacy on both model utility and defense performance. This work will elevate the awareness of preventing property inference attacks and encourage privacy-preserving synthetic data release.

翻译：扩散模型在数据合成领域取得了显著成功。然而，当将这些模型应用于银行、人脸数据等敏感数据集时，可能引发严重的隐私问题。本文首次系统性地开展了针对扩散模型的属性推断攻击的隐私研究，攻击者旨在从扩散模型中提取其训练集的敏感全局属性。具体而言，我们聚焦于最实际的攻击场景：攻击者仅能访问合成数据。在此现实场景下，我们对基于多种数据类型（包括表格数据和图像数据）训练的各类扩散模型进行了全面的属性推断攻击评估。广泛评估表明，扩散模型及其采样器普遍易受属性推断攻击。为此，我们提出了一种新型模型无关的即插即用方法PriSampler，以减轻扩散模型属性推断的风险。PriSampler可直接应用于预训练的扩散模型，且支持随机采样与确定性采样。大量实验证明了我们防御方法的有效性，它能够迫使攻击者推断出的属性比例趋近于模型所有者预设的目标值。值得注意的是，在模型效用与防御性能方面，PriSampler均显著优于采用差分隐私训练的扩散模型。本研究将提升对防御属性推断攻击的重视，并促进隐私保护的合成数据发布。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日