Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift

Personalizing text-to-image diffusion models involves integrating novel visual concepts from a small set of reference images while retaining the model's original generative capabilities. However, this process often leads to overfitting, where the model ignores the user's prompt and merely replicates the reference images. We attribute this issue to a fundamental misalignment between the true goals of personalization, which are subject fidelity and text alignment, and the training objectives of existing methods that fail to enforce both objectives simultaneously. Specifically, prior approaches often overlook the need to explicitly preserve the pretrained model's output distribution, resulting in distributional drift that undermines diversity and coherence. To resolve these challenges, we introduce a Lipschitz-based regularization objective that constrains parameter updates during personalization, ensuring bounded deviation from the original distribution. This promotes consistency with the pretrained model's behavior while enabling accurate adaptation to new concepts. Furthermore, our method offers a computationally efficient alternative to commonly used, resource-intensive sampling techniques. Through extensive experiments across diverse diffusion model architectures, we demonstrate that our approach achieves superior performance in both quantitative metrics and qualitative evaluations, consistently excelling in visual fidelity and prompt adherence. We further support these findings with comprehensive analyses, including ablation studies and visualizations.

翻译：个性化文本到图像扩散模型需要从少量参考图像中整合新颖视觉概念，同时保持模型的原始生成能力。然而，这一过程常导致过拟合——模型忽略用户提示而仅复制参考图像。我们将此问题归因于个性化的真实目标（主体保真度与文本对齐）与现有方法训练目标之间的根本性错位：现有方法无法同时实现这两个目标。具体而言，先前方法往往忽略显式保留预训练模型输出分布的必要性，导致分布漂移破坏生成多样性与连贯性。为解决这些挑战，我们提出基于Lipschitz的正则化目标函数，在个性化过程中约束参数更新，确保与原分布的偏差有界。该方法既能保持与预训练模型行为的一致性，又能实现对新颖概念的准确适配。此外，我们的方法为常用且资源密集的采样技术提供了计算高效替代方案。通过涵盖多种扩散模型架构的广泛实验证明，本方法在定量指标与定性评估中均取得优异性能，在视觉保真度与提示遵循度上持续表现卓越。我们还通过消融实验与可视化等全面分析进一步验证了这些发现。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【书籍】从零开始构建文本生成图像生成器：基于 Transformers 与扩散模型

专知会员服务

25+阅读 · 2025年12月27日

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

专知会员服务

51+阅读 · 2025年11月21日

【NeurIPS2025】Seg4Diff：揭示文本到图像扩散 Transformer 中的开放词汇分割

专知会员服务

10+阅读 · 2025年9月23日

扩散模型中的 Transformer：图像生成及其延展应用询问 ChatGPT

专知会员服务

11+阅读 · 2025年7月5日