MemControl: Mitigating Memorization in Diffusion Models via Automated Parameter Selection

Diffusion models excel in generating images that closely resemble their training data but are also susceptible to data memorization, raising privacy, ethical, and legal concerns, particularly in sensitive domains such as medical imaging. We hypothesize that this memorization stems from the overparameterization of deep models and propose that regularizing model capacity during fine-tuning can mitigate this issue. Firstly, we empirically show that regulating the model capacity via Parameter-efficient fine-tuning (PEFT) mitigates memorization to some extent, however, it further requires the identification of the exact parameter subsets to be fine-tuned for high-quality generation. To identify these subsets, we introduce a bi-level optimization framework, MemControl, that automates parameter selection using memorization and generation quality metrics as rewards during fine-tuning. The parameter subsets discovered through MemControl achieve a superior tradeoff between generation quality and memorization. For the task of medical image generation, our approach outperforms existing state-of-the-art memorization mitigation strategies by fine-tuning as few as 0.019% of model parameters. Moreover, we demonstrate that the discovered parameter subsets are transferable to non-medical domains. Our framework is scalable to large datasets, agnostic to reward functions, and can be integrated with existing approaches for further memorization mitigation. To the best of our knowledge, this is the first study to empirically evaluate memorization in medical images and propose a targeted yet universal mitigation strategy. The code is available at https://github.com/Raman1121/Diffusion_Memorization_HPO

翻译：扩散模型在生成与训练数据高度相似的图像方面表现出色，但也容易产生数据记忆效应，从而引发隐私、伦理和法律方面的担忧，在医学影像等敏感领域尤为突出。我们假设这种记忆效应源于深度模型的过参数化，并提出在微调期间正则化模型容量可以缓解此问题。首先，我们通过实验证明，通过参数高效微调（PEFT）调控模型容量能在一定程度上减轻记忆效应，但这仍需进一步确定需要微调的精确参数子集以实现高质量生成。为识别这些子集，我们提出了一个双层优化框架MemControl，该框架在微调过程中使用记忆效应和生成质量指标作为奖励，实现参数选择的自动化。通过MemControl发现的参数子集在生成质量和记忆效应之间实现了更优的权衡。在医学图像生成任务中，我们的方法仅需微调模型0.019%的参数，其性能就超越了现有的最先进记忆缓解策略。此外，我们证明了所发现的参数子集可迁移至非医学领域。我们的框架可扩展至大型数据集，对奖励函数具有不可知性，并能与现有方法结合以进一步缓解记忆效应。据我们所知，这是首个对医学图像中的记忆效应进行实证评估，并提出具有针对性且通用缓解策略的研究。代码发布于 https://github.com/Raman1121/Diffusion_Memorization_HPO

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日