DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced sensitivity to hyperparameters, leading to a compromise between parameter efficiency and the quality of T2I personalized image synthesis. Addressing these constraints, we introduce \textbf{\textit{DiffuseKronA}}, a novel Kronecker product-based adaptation module that not only significantly reduces the parameter count by 35\% and 99.947\% compared to LoRA-DreamBooth and the original DreamBooth, respectively, but also enhances the quality of image synthesis. Crucially, \textit{DiffuseKronA} mitigates the issue of hyperparameter sensitivity, delivering consistent high-quality generations across a wide range of hyperparameters, thereby diminishing the necessity for extensive fine-tuning. Furthermore, a more controllable decomposition makes \textit{DiffuseKronA} more interpretable and even can achieve up to a 50\% reduction with results comparable to LoRA-Dreambooth. Evaluated against diverse and complex input images and text prompts, \textit{DiffuseKronA} consistently outperforms existing models, producing diverse images of higher quality with improved fidelity and a more accurate color distribution of objects, all the while upholding exceptional parameter efficiency, thus presenting a substantial advancement in the field of T2I generative modeling. Our project page, consisting of links to the code, and pre-trained checkpoints, is available at \href{https://diffusekrona.github.io/}{https://diffusekrona.github.io/}.

翻译：在主题驱动的文本到图像（T2I）生成模型领域，DreamBooth和BLIP-Diffusion等近期发展取得了令人瞩目的成果，但由于其密集的微调需求和大量的参数要求而存在局限性。虽然DreamBooth中的低秩适配模块在减少可训练参数方面有所成效，但它引入了对超参数的显著敏感性，导致参数效率与T2I个性化图像合成质量之间需要权衡。针对这些限制，我们提出了\textbf{\textit{DiffuseKronA}}，一种基于Kronecker积的新颖适配模块，该模块不仅相比LoRA-DreamBooth和原始DreamBooth分别将参数量显著减少了35%和99.947%，而且提升了图像合成质量。至关重要的是，\textit{DiffuseKronA}缓解了超参数敏感性问题，在广泛的超参数范围内生成一致的高质量图像，从而降低了对大量微调的需求。此外，更可控的分解方式使\textit{DiffuseKronA}更具可解释性，甚至可以在获得与LoRA-DreamBooth相当结果的前提下，实现高达50%的参数缩减。在多样化且复杂的输入图像与文本提示的评估中，\textit{DiffuseKronA}持续优于现有模型，生成具有更高保真度和更准确物体颜色分布的多样化高质量图像，同时保持卓越的参数效率，从而在T2I生成建模领域取得了实质性进展。我们的项目页面包含代码和预训练检查点链接，可通过\href{https://diffusekrona.github.io/}{https://diffusekrona.github.io/}访问。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日