While diffusion model fine-tuning offers a powerful approach for customizing pre-trained models to generate specific objects, it frequently suffers from overfitting when training samples are limited, compromising both generalization capability and output diversity. This paper tackles the challenging yet most impactful task of adapting a diffusion model using just a single concept image, as single-image customization holds the greatest practical potential. We introduce T-LoRA, a Timestep-Dependent Low-Rank Adaptation framework specifically designed for diffusion model personalization. We show that higher diffusion timesteps are more prone to overfitting than lower ones, necessitating a timestep-sensitive fine-tuning strategy. T-LoRA incorporates two key innovations: (1) a dynamic fine-tuning strategy that adjusts rank-constrained updates based on diffusion timesteps, and (2) a weight parametrization technique that ensures independence between adapter components through orthogonal initialization. Extensive experiments on SD-XL and FLUX-1.dev show that T-LoRA and its individual components outperform standard LoRA and other diffusion model personalization techniques, achieving a superior balance between concept fidelity and text alignment. Project page is available at https://controlgenai.github.io/T-LoRA/.
翻译:尽管扩散模型微调为将预训练模型定制化以生成特定对象提供了一种强大方法,但在训练样本有限时,它经常遭受过拟合问题,从而损害泛化能力和输出多样性。本文致力于解决仅使用单张概念图像来适配扩散模型这一极具挑战性但最具实际影响力的任务,因为单图像定制拥有最大的实用潜力。我们提出了T-LoRA,一个专门为扩散模型个性化设计的、时间步依赖的低秩适配框架。我们发现,较高的扩散时间步比较低的时间步更容易发生过拟合,因此需要一种对时间步敏感的微调策略。T-LoRA包含两项关键创新:(1) 一种基于扩散时间步动态调整秩约束更新的动态微调策略;(2) 一种通过正交初始化确保适配器组件之间独立性的权重参数化技术。在SD-XL和FLUX-1.dev上进行的大量实验表明,T-LoRA及其各个组件均优于标准LoRA及其他扩散模型个性化技术,在概念保真度和文本对齐之间实现了更优的平衡。项目页面位于 https://controlgenai.github.io/T-LoRA/。