One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models, such as Stable Diffusion, to generate paired datasets used for training generative adversarial networks (GANs). This approach notably alleviates the stringent requirements typically imposed by high-end commercial GPUs for performing image editing with diffusion models. However, unlike text-to-image diffusion models, each distilled GAN is specialized for a specific image editing task, necessitating costly training efforts to obtain models for various concepts. In this work, we introduce and address a novel research direction: can the process of distilling GANs from diffusion models be made significantly more efficient? To achieve this goal, we propose a series of innovative techniques. First, we construct a base GAN model with generalized features, adaptable to different concepts through fine-tuning, eliminating the need for training from scratch. Second, we identify crucial layers within the base GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model. Third, we investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time. Extensive experiments show that we can efficiently empower GANs with the ability to perform real-time high-quality image editing on mobile devices with remarkable reduced training cost and storage for each concept.
翻译:一种极具前景的实现灵活实时设备端图像编辑的方法,是利用大规模文生图扩散模型(如Stable Diffusion)进行数据蒸馏,生成用于训练生成对抗网络(GANs)的配对数据集。该方法显著缓解了高端商用GPU在扩散模型图像编辑中的严苛硬件要求。然而,不同于文生图扩散模型,每个蒸馏得到的GAN仅专精于特定的图像编辑任务,这导致为获取不同概念对应的模型需要高昂的训练成本。本研究提出并探索了一个全新研究方向:能否显著提升从扩散模型中蒸馏GAN的效率?为此,我们提出了一系列创新技术。首先,我们构建了一个具有通用特征的基础GAN模型,可通过微调适配不同概念,从而避免从头训练。其次,我们在基础GAN模型中识别关键层,并采用低秩适配(LoRA)结合简单有效的秩搜索流程,而非对整个基础模型进行微调。第三,我们探究了微调所需的最小数据量,进一步缩短整体训练时间。大量实验表明,我们能够高效赋予GAN在移动设备上执行实时高质量图像编辑的能力,同时显著降低每个概念的训练成本和存储开销。