The objective of personalization and stylization in text-to-image is to instruct a pre-trained diffusion model to analyze new concepts introduced by users and incorporate them into expected styles. Recently, parameter-efficient fine-tuning (PEFT) approaches have been widely adopted to address this task and have greatly propelled the development of this field. Despite their popularity, existing efficient fine-tuning methods still struggle to achieve effective personalization and stylization in T2I generation. To address this issue, we propose block-wise Low-Rank Adaptation (LoRA) to perform fine-grained fine-tuning for different blocks of SD, which can generate images faithful to input prompts and target identity and also with desired style. Extensive experiments demonstrate the effectiveness of the proposed method.
翻译:文本到图像生成中的个性化与风格化目标在于指导预训练扩散模型分析用户引入的新概念,并将其融入预期风格中。近年来,参数高效微调方法被广泛采用以解决该任务,并极大推动了该领域的发展。尽管现有高效微调方法广受欢迎,但在实现T2I生成的有效个性化与风格化方面仍面临挑战。为解决此问题,我们提出块级低秩适配方法,对Stable Diffusion的不同模块进行细粒度微调,从而生成既忠实于输入提示词与目标身份,又具备期望风格的图像。大量实验证明了所提方法的有效性。