Diffusion models are a powerful generative framework, but come with expensive inference. Existing acceleration methods often compromise image quality or fail under complex conditioning when operating in an extremely low-step regime. In this work, we propose a novel distillation framework tailored to enable high-fidelity, diverse sample generation using just one to three steps. Our approach comprises three key components: (i) Backward Distillation, which mitigates training-inference discrepancies by calibrating the student on its own backward trajectory; (ii) Shifted Reconstruction Loss that dynamically adapts knowledge transfer based on the current time step; and (iii) Noise Correction, an inference-time technique that enhances sample quality by addressing singularities in noise prediction. Through extensive experiments, we demonstrate that our method outperforms existing competitors in quantitative metrics and human evaluations. Remarkably, it achieves performance comparable to the teacher model using only three denoising steps, enabling efficient high-quality generation.
翻译:扩散模型是一种强大的生成框架,但其推理成本高昂。现有的加速方法在极低步数场景下往往牺牲图像质量,或在复杂条件控制下表现不佳。本文提出一种新型蒸馏框架,专为仅需1-3步即可生成高保真度、多样化的样本而设计。该方法包含三个关键组件:(i) 反向蒸馏——通过让学生模型在其自身反向轨迹上校准,缓解训练-推理差异;(ii) 偏移重构损失——根据当前时间步动态调整知识迁移机制;(iii) 噪声校正——通过解决噪声预测中的奇异性问题来提升推理阶段样本质量的推理时技术。大量实验表明,本方法在量化指标与人工评估中均优于现有竞争者。值得注意的是,仅需三步去噪即可达到与教师模型相当的性能,从而实现高效高质量生成。