Personalized text-to-image generation has gained significant attention for its capability to generate high-fidelity portraits of specific identities conditioned on user-defined prompts. Existing methods typically involve test-time fine-tuning or incorporating an additional pre-trained branch. However, these approaches struggle to simultaneously address efficiency, identity fidelity, and the preservation of the model's original generative capabilities. In this paper, we propose DiffLoRA, an efficient method that leverages the diffusion model as a hypernetwork to predict personalized Low-Rank Adaptation (LoRA) weights based on the reference images. By incorporating these LoRA weights into the off-the-shelf text-to-image model, DiffLoRA enables zero-shot personalization during inference, eliminating the need for post-processing optimization. Moreover, we introduce a novel identity-oriented LoRA weights construction pipeline to facilitate the training process of DiffLoRA. The dataset generated through this pipeline enables DiffLoRA to produce consistently high-quality LoRA weights. Notably, the distinctive properties of the diffusion model enhance the generation of superior weights by employing probabilistic modeling to capture intricate structural patterns and thoroughly explore the weight space. Comprehensive experimental results demonstrate that DiffLoRA outperforms existing personalization approaches across multiple benchmarks, achieving both time efficiency and maintaining identity fidelity throughout the personalization process.
翻译:个性化文本到图像生成因其能够根据用户定义的提示生成特定身份的高保真肖像而受到广泛关注。现有方法通常涉及测试时微调或引入额外的预训练分支。然而,这些方法难以同时兼顾效率、身份保真度以及模型原始生成能力的保持。本文提出DiffLoRA,一种高效方法,利用扩散模型作为超网络,根据参考图像预测个性化的低秩自适应(LoRA)权重。通过将这些LoRA权重整合到现成的文本到图像模型中,DiffLoRA在推理过程中实现了零样本个性化,无需后处理优化。此外,我们引入了一种新颖的面向身份的LoRA权重构建流程,以促进DiffLoRA的训练过程。通过该流程生成的数据集使DiffLoRA能够持续生成高质量的LoRA权重。值得注意的是,扩散模型的独特特性通过采用概率建模来捕捉复杂的结构模式并充分探索权重空间,从而增强了生成更优权重的能力。综合实验结果表明,DiffLoRA在多个基准测试中优于现有个性化方法,在个性化过程中既实现了时间效率,又保持了身份保真度。