Recent advancements in text-to-image (T2I) generation have led to the emergence of highly expressive models such as diffusion transformers (DiTs), exemplified by FLUX. However, their massive parameter sizes lead to slow inference, high memory usage, and poor deployability. Existing acceleration methods (e.g., single-step distillation and attention pruning) often suffer from significant performance degradation and incur substantial training costs. To address these limitations, we propose FastFLUX, an architecture-level pruning framework designed to enhance the inference efficiency of FLUX. At its core is the Block-wise Replacement with Linear Layers (BRLL) method, which replaces structurally complex residual branches in ResBlocks with lightweight linear layers while preserving the original shortcut connections for stability. Furthermore, we introduce Sandwich Training (ST), a localized fine-tuning strategy that leverages LoRA to supervise neighboring blocks, mitigating performance drops caused by structural replacement. Experiments show that our FastFLUX maintains high image quality under both qualitative and quantitative evaluations, while significantly improving inference speed, even with 20\% of the hierarchy pruned. Our code will be available soon.
翻译:近年来,文本到图像(T2I)生成领域取得了显著进展,催生了以FLUX为代表的高表达能力模型,如扩散Transformer(DiT)。然而,这些模型庞大的参数量导致推理速度缓慢、内存占用高且部署性差。现有的加速方法(例如单步蒸馏与注意力剪枝)通常面临显著的性能下降问题,并产生高昂的训练成本。为应对这些局限,本文提出FastFLUX——一种架构级剪枝框架,旨在提升FLUX的推理效率。其核心是块级线性层替换(BRLL)方法,该方法将ResBlock中结构复杂的残差分支替换为轻量级线性层,同时保留原有的快捷连接以确保稳定性。此外,我们引入了三明治训练(ST),这是一种基于LoRA的局部微调策略,通过对相邻模块进行监督来缓解结构替换导致的性能下降。实验表明,即使在剪除20%层级结构的情况下,FastFLUX仍能在定性与定量评估中保持较高的图像生成质量,同时显著提升推理速度。我们的代码即将公开。