While classifier-free guidance (CFG) is essential for conditional diffusion models, it doubles the number of neural function evaluations (NFEs) per inference step. To mitigate this inefficiency, we introduce adapter guidance distillation (AGD), a novel approach that simulates CFG in a single forward pass. AGD leverages lightweight adapters to approximate CFG, effectively doubling the sampling speed while maintaining or even improving sample quality. Unlike prior guidance distillation methods that tune the entire model, AGD keeps the base model frozen and only trains minimal additional parameters ($\sim$2%) to significantly reduce the resource requirement of the distillation phase. Additionally, this approach preserves the original model weights and enables the adapters to be seamlessly combined with other checkpoints derived from the same base model. We also address a key mismatch between training and inference in existing guidance distillation methods by training on CFG-guided trajectories instead of standard diffusion trajectories. Through extensive experiments, we show that AGD achieves comparable or superior FID to CFG across multiple architectures with only half the NFEs. Notably, our method enables the distillation of large models ($\sim$2.6B parameters) on a single consumer GPU with 24 GB of VRAM, making it more accessible than previous approaches that require multiple high-end GPUs. We will publicly release the implementation of our method.
翻译:尽管无分类器引导(CFG)对于条件扩散模型至关重要,但其在每次推理步骤中会使神经函数评估(NFE)次数翻倍。为缓解这一效率问题,我们提出了一种新颖的适配器引导蒸馏(AGD)方法,可在单次前向传播中模拟CFG。AGD利用轻量级适配器来近似CFG,在保持甚至提升样本质量的同时,有效实现了采样速度的翻倍。与先前需要微调整个模型的引导蒸馏方法不同,AGD保持基础模型冻结,仅训练极少量额外参数(约2%),从而显著降低了蒸馏阶段的资源需求。此外,该方法保留了原始模型权重,并使适配器能够与源自同一基础模型的其他检查点无缝结合。我们还通过基于CFG引导轨迹(而非标准扩散轨迹)进行训练,解决了现有引导蒸馏方法中训练与推理的关键不匹配问题。大量实验表明,AGD在仅需一半NFE的情况下,于多种架构上取得了与CFG相当或更优的FID分数。值得注意的是,我们的方法能够在单张24GB显存的消费级GPU上完成大模型(约26亿参数)的蒸馏,这比需要多张高端GPU的先前方法更具可及性。我们将公开本方法的实现代码。