ArcFlow: Unleashing 2-Step Text-to-Image Generation via High-Precision Non-Linear Flow Distillation

Diffusion models have achieved remarkable generation quality, but they suffer from significant inference cost due to their reliance on multiple sequential denoising steps, motivating recent efforts to distill this inference process into a few-step regime. However, existing distillation methods typically approximate the teacher trajectory by using linear shortcuts, which makes it difficult to match its constantly changing tangent directions as velocities evolve across timesteps, thereby leading to quality degradation. To address this limitation, we propose ArcFlow, a few-step distillation framework that explicitly employs non-linear flow trajectories to approximate pre-trained teacher trajectories. Concretely, ArcFlow parameterizes the velocity field underlying the inference trajectory as a mixture of continuous momentum processes. This enables ArcFlow to capture velocity evolution and extrapolate coherent velocities to form a continuous non-linear trajectory within each denoising step. Importantly, this parameterization admits an analytical integration of this non-linear trajectory, which circumvents numerical discretization errors and results in high-precision approximation of the teacher trajectory. To train this parameterization into a few-step generator, we implement ArcFlow via trajectory distillation on pre-trained teacher models using lightweight adapters. This strategy ensures fast, stable convergence while preserving generative diversity and quality. Built on large-scale models (Qwen-Image-20B and FLUX.1-dev), ArcFlow only fine-tunes on less than 5% of original parameters and achieves a 40x speedup with 2 NFEs over the original multi-step teachers without significant quality degradation. Experiments on benchmarks show the effectiveness of ArcFlow both qualitatively and quantitatively.

翻译：扩散模型已实现卓越的生成质量，但其依赖多步顺序去噪过程导致推理成本显著，这推动了近期将该推理过程蒸馏至少步机制的研究。然而，现有蒸馏方法通常采用线性捷径来逼近教师轨迹，难以匹配速度场随时间步不断变化的切线方向，从而导致质量下降。为突破此限制，我们提出ArcFlow——一种显式采用非线性流轨迹逼近预训练教师轨迹的少步蒸馏框架。具体而言，ArcFlow将推理轨迹底层速度场参数化为连续动量过程的混合形式，使其能够捕捉速度演化规律并外推连贯速度，在每个去噪步内形成连续非线性轨迹。关键的是，该参数化允许对此非线性轨迹进行解析积分，从而规避数值离散化误差，实现对教师轨迹的高精度逼近。为将此参数化训练为少步生成器，我们通过轻量适配器在预训练教师模型上实施轨迹蒸馏。该策略在保持生成多样性与质量的同时，确保了快速稳定的收敛。基于大规模模型（Qwen-Image-20B与FLUX.1-dev）构建的ArcFlow仅需微调不足5%的原始参数，在2次NFEs条件下实现40倍加速，且未出现显著质量衰减。基准测试实验从定性与定量角度验证了ArcFlow的有效性。