In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced sensitivity to hyperparameters, leading to a compromise between parameter efficiency and the quality of T2I personalized image synthesis. Addressing these constraints, we introduce \textbf{\textit{DiffuseKronA}}, a novel Kronecker product-based adaptation module that not only significantly reduces the parameter count by 35\% and 99.947\% compared to LoRA-DreamBooth and the original DreamBooth, respectively, but also enhances the quality of image synthesis. Crucially, \textit{DiffuseKronA} mitigates the issue of hyperparameter sensitivity, delivering consistent high-quality generations across a wide range of hyperparameters, thereby diminishing the necessity for extensive fine-tuning. Furthermore, a more controllable decomposition makes \textit{DiffuseKronA} more interpretable and even can achieve up to a 50\% reduction with results comparable to LoRA-Dreambooth. Evaluated against diverse and complex input images and text prompts, \textit{DiffuseKronA} consistently outperforms existing models, producing diverse images of higher quality with improved fidelity and a more accurate color distribution of objects, all the while upholding exceptional parameter efficiency, thus presenting a substantial advancement in the field of T2I generative modeling. Our project page, consisting of links to the code, and pre-trained checkpoints, is available at https://diffusekrona.github.io/.
翻译:在主体驱动的文本到图像(T2I)生成模型领域,DreamBooth 和 BLIP-Diffusion 等最新进展取得了令人瞩目的成果,但其密集的微调需求和庞大的参数规模限制了进一步应用。虽然 DreamBooth 中的低秩适配(LoRA)模块减少了可训练参数,却引入了显著的超参数敏感性,导致参数效率与 T2I 个性化图像合成质量之间需要权衡。针对这些限制,我们提出 **DiffuseKronA**——一种基于克罗内克积的新型适配模块。该模块不仅相比 LoRA-DreamBooth 和原始 DreamBooth 分别减少了 35% 和 99.947% 的参数数量,还提升了图像合成质量。关键在于,**DiffuseKronA** 缓解了超参数敏感性问题,能够在广泛超参数范围内生成稳定高质量图像,从而降低对深度微调的需求。此外,更可控的分解方式使 **DiffuseKronA** 更具可解释性,甚至可在结果与 LoRA-DreamBooth 相当的情况下实现高达 50% 的参数缩减。在多样化且复杂的输入图像和文本提示下的评估显示,**DiffuseKronA** 始终优于现有模型,生成具有更高保真度、更准确物体颜色分布的高质量多样化图像,同时保持卓越的参数效率,从而推动了 T2I 生成建模领域的重大进展。我们的项目页面(包含代码和预训练检查点链接)位于 https://diffusekrona.github.io/。