Flow-matching models deliver state-of-the-art fidelity in image and video generation, but the inherent sequential denoising process renders them slower. Existing acceleration methods like distillation, trajectory truncation, and consistency approaches are static, require retraining, and often fail to generalize across tasks. We propose FastFlow, a plug-and-play adaptive inference framework that accelerates generation in flow matching models. FastFlow identifies denoising steps that produce only minor adjustments to the denoising path and approximates them without using the full neural network models used for velocity predictions. The approximation utilizes finite-difference velocity estimates from prior predictions to efficiently extrapolate future states, enabling faster advancements along the denoising path at zero compute cost. This enables skipping computation at intermediary steps. We model the decision of how many steps to safely skip before requiring a full model computation as a multi-armed bandit problem. The bandit learns the optimal skips to balance speed with performance. FastFlow integrates seamlessly with existing pipelines and generalizes across image generation, video generation, and editing tasks. Experiments demonstrate a speedup of over 2.6x while maintaining high-quality outputs. The source code for this work can be found at https://github.com/Div290/FastFlow.
翻译:流匹配模型在图像和视频生成方面实现了最先进的保真度,但其固有的顺序去噪过程导致生成速度较慢。现有的加速方法(如蒸馏、轨迹截断和一致性方法)是静态的,需要重新训练,且往往难以跨任务泛化。我们提出了FastFlow,一种即插即用的自适应推理框架,用于加速流匹配模型的生成过程。FastFlow能够识别那些仅对去噪路径产生微小调整的去噪步骤,并在无需使用完整神经网络模型进行速度预测的情况下近似这些步骤。该近似方法利用先前预测的有限差分速度估计来高效外推未来状态,从而以零计算成本实现沿去噪路径的快速推进,这使得跳过中间步骤的计算成为可能。我们将“在需要完整模型计算之前可以安全跳过多少步骤”的决策建模为一个多臂赌博机问题。该赌博机通过学习最优的跳过策略来平衡速度与性能。FastFlow能够无缝集成到现有流程中,并在图像生成、视频生成和编辑任务中展现出良好的泛化能力。实验表明,该方法在保持高质量输出的同时实现了超过2.6倍的加速。本工作的源代码可在 https://github.com/Div290/FastFlow 获取。