Diffusion models excel in generating images that closely resemble their training data but are also susceptible to data memorization, raising privacy, ethical, and legal concerns, particularly in sensitive domains such as medical imaging. We hypothesize that this memorization stems from the overparameterization of deep models and propose that regularizing model capacity during fine-tuning can mitigate this issue. Firstly, we empirically show that regulating the model capacity via Parameter-efficient fine-tuning (PEFT) mitigates memorization to some extent, however, it further requires the identification of the exact parameter subsets to be fine-tuned for high-quality generation. To identify these subsets, we introduce a bi-level optimization framework, MemControl, that automates parameter selection using memorization and generation quality metrics as rewards during fine-tuning. The parameter subsets discovered through MemControl achieve a superior tradeoff between generation quality and memorization. For the task of medical image generation, our approach outperforms existing state-of-the-art memorization mitigation strategies by fine-tuning as few as 0.019% of model parameters. Moreover, we demonstrate that the discovered parameter subsets are transferable to non-medical domains. Our framework is scalable to large datasets, agnostic to reward functions, and can be integrated with existing approaches for further memorization mitigation. To the best of our knowledge, this is the first study to empirically evaluate memorization in medical images and propose a targeted yet universal mitigation strategy. The code is available at https://github.com/Raman1121/Diffusion_Memorization_HPO
翻译:扩散模型在生成与训练数据高度相似的图像方面表现出色,但也容易产生数据记忆效应,从而引发隐私、伦理和法律方面的担忧,在医学影像等敏感领域尤为突出。我们假设这种记忆效应源于深度模型的过参数化,并提出在微调期间正则化模型容量可以缓解此问题。首先,我们通过实验证明,通过参数高效微调(PEFT)调控模型容量能在一定程度上减轻记忆效应,但这仍需进一步确定需要微调的精确参数子集以实现高质量生成。为识别这些子集,我们提出了一个双层优化框架MemControl,该框架在微调过程中使用记忆效应和生成质量指标作为奖励,实现参数选择的自动化。通过MemControl发现的参数子集在生成质量和记忆效应之间实现了更优的权衡。在医学图像生成任务中,我们的方法仅需微调模型0.019%的参数,其性能就超越了现有的最先进记忆缓解策略。此外,我们证明了所发现的参数子集可迁移至非医学领域。我们的框架可扩展至大型数据集,对奖励函数具有不可知性,并能与现有方法结合以进一步缓解记忆效应。据我们所知,这是首个对医学图像中的记忆效应进行实证评估,并提出具有针对性且通用缓解策略的研究。代码发布于 https://github.com/Raman1121/Diffusion_Memorization_HPO