While diffusion models excel at generating high-quality images, their tendency to memorize training data poses significant privacy and copyright risks. In this work, we for the first time identify that memorization induces internal numerical instability, often manifesting as visually ``broken'' artifacts. Inspired by stability analysis in numerical methods, we introduce empirical stability regions based on latent update norms to quantitatively characterize stable behavior during generation. Leveraging this, we propose a principled, on-the-fly framework for step-wise detection and adaptive mitigation. Our approach suppresses memorization without altering prompts or guidance, thereby preserving semantic fidelity and image quality. Extensive experiments on Stable Diffusion 1.4 demonstrate that our method achieves an AUC $>0.999$ detection performance and a $0.0\%$ memorization rate after mitigation with negligible overhead ($\approx0.01$s per image).
翻译:尽管扩散模型在生成高质量图像方面表现出色,但其记忆训练数据的倾向带来了显著的隐私与版权风险。本研究首次发现,记忆化会导致内部数值不稳定性,通常表现为视觉上"破碎"的伪影。受数值方法中稳定性分析的启发,我们基于潜在更新范数引入经验稳定性区域,以定量刻画生成过程中的稳定行为。基于此,我们提出了一种原则性的即时框架,用于逐步骤检测与自适应缓解。该方法无需修改提示词或引导条件即可抑制记忆化,从而保持语义保真度与图像质量。在Stable Diffusion 1.4上的大量实验表明,我们的方法在检测阶段AUC > 0.999,缓解后记忆化率为0.0%,且额外开销可忽略不计(约每张图像0.01秒)。