We introduce NitroFusion, a fundamentally different approach to single-step diffusion that achieves high-quality generation through a dynamic adversarial framework. While one-step methods offer dramatic speed advantages, they typically suffer from quality degradation compared to their multi-step counterparts. Just as a panel of art critics provides comprehensive feedback by specializing in different aspects like composition, color, and technique, our approach maintains a large pool of specialized discriminator heads that collectively guide the generation process. Each discriminator group develops expertise in specific quality aspects at different noise levels, providing diverse feedback that enables high-fidelity one-step generation. Our framework combines: (i) a dynamic discriminator pool with specialized discriminator groups to improve generation quality, (ii) strategic refresh mechanisms to prevent discriminator overfitting, and (iii) global-local discriminator heads for multi-scale quality assessment, and unconditional/conditional training for balanced generation. Additionally, our framework uniquely supports flexible deployment through bottom-up refinement, allowing users to dynamically choose between 1-4 denoising steps with the same model for direct quality-speed trade-offs. Through comprehensive experiments, we demonstrate that NitroFusion significantly outperforms existing single-step methods across multiple evaluation metrics, particularly excelling in preserving fine details and global consistency.
翻译:我们提出了NitroFusion,这是一种从根本上不同的单步扩散方法,通过动态对抗框架实现高质量生成。虽然一步法在速度上具有显著优势,但与多步方法相比,它们通常存在质量下降的问题。正如一组艺术评论家通过专注于构图、色彩和技法等不同方面提供全面反馈一样,我们的方法维护了一个由多个专业化判别器头组成的大型池,它们共同指导生成过程。每个判别器组在不同噪声水平下针对特定质量方面发展出专业能力,提供多样化的反馈,从而实现高保真的一步生成。我们的框架结合了:(i) 一个包含专业化判别器组的动态判别器池,以提高生成质量;(ii) 策略性刷新机制,以防止判别器过拟合;(iii) 用于多尺度质量评估的全局-局部判别器头,以及用于平衡生成的无条件/条件训练。此外,我们的框架通过自底向上精炼独特地支持灵活部署,允许用户使用同一模型在1-4个去噪步骤之间动态选择,以实现直接的质量-速度权衡。通过全面的实验,我们证明NitroFusion在多个评估指标上显著优于现有的单步方法,尤其在保留精细细节和全局一致性方面表现出色。