In recent years, image editing has garnered growing attention. However, general image editing models often fail to produce satisfactory results when confronted with new styles. The challenge lies in how to effectively fine-tune general image editing models to new styles using only a limited amount of paired data. To address this issue, this paper proposes a novel few-shot style editing framework. For this task, we construct a benchmark dataset that encompasses five distinct styles. Correspondingly, we propose a parameter-efficient multi-style Mixture-of-Experts Low-Rank Adaptation (MoE LoRA) with style-specific and style-shared routing mechanisms for jointly fine-tuning multiple styles. The style-specific routing ensures that different styles do not interfere with one another, while the style-shared routing adaptively allocates shared MoE LoRAs to learn common patterns. Our MoE LoRA can automatically determine the optimal ranks for each layer through a novel metric-guided approach that estimates the importance score of each single-rank component. Additionally, we explore the optimal location to insert LoRA within the Diffusion in Transformer (DiT) model and integrate adversarial learning and flow matching to guide the diffusion training process. Experimental results demonstrate that our proposed method outperforms existing state-of-the-art approaches with significantly fewer LoRA parameters. Our code and dataset are available at https://github.com/cao-cong/FSMSE.
翻译:近年来,图像编辑领域受到越来越多的关注。然而,通用图像编辑模型在面对新风格时往往难以产生令人满意的结果。其挑战在于如何仅使用有限量的配对数据,将通用图像编辑模型有效地微调到新风格。为解决此问题,本文提出了一种新颖的少样本风格编辑框架。针对此任务,我们构建了一个包含五种不同风格的基准数据集。相应地,我们提出了一种参数高效的多风格专家混合低秩适应(MoE LoRA),该模型具有风格特定和风格共享的路由机制,用于联合微调多种风格。风格特定路由确保不同风格之间互不干扰,而风格共享路由则自适应地分配共享的MoE LoRA以学习共同模式。我们的MoE LoRA能够通过一种新颖的度量引导方法自动确定每层的最佳秩,该方法估计每个单秩分量的重要性分数。此外,我们探索了在Transformer扩散(DiT)模型中插入LoRA的最佳位置,并整合了对抗学习和流匹配来指导扩散训练过程。实验结果表明,我们提出的方法以显著更少的LoRA参数,超越了现有最先进的方法。我们的代码和数据集可在 https://github.com/cao-cong/FSMSE 获取。