We propose continuous adversarial flow models, a type of continuous-time flow model trained with an adversarial objective. Unlike flow matching, which uses a fixed mean-squared-error criterion, our approach introduces a learned discriminator to guide training. This change in objective induces a different generalized distribution, which empirically produces samples that are better aligned with the target data distribution. Our method is primarily proposed for post-training existing flow-matching models, although it can also train models from scratch. On the ImageNet 256px generation task, our post-training substantially improves the guidance-free FID of latent-space SiT from 8.26 to 3.63 and of pixel-space JiT from 7.17 to 3.57. It also improves guided generation, reducing FID from 2.06 to 1.53 for SiT and from 1.86 to 1.80 for JiT. We further evaluate our approach on text-to-image generation, where it achieves improved results on both the GenEval and DPG benchmarks.
翻译:我们提出连续对抗流模型,这是一种采用对抗目标训练的连续时间流模型。与使用固定均方误差准则的流匹配不同,我们的方法引入学习型判别器来指导训练。这一目标函数的变化诱导出不同的广义分布,实验表明该分布生成的样本能更好地对齐目标数据分布。我们的方法主要针对现有流匹配模型的后训练提出,但也可用于从头训练模型。在ImageNet 256像素生成任务中,我们的后训练将潜在空间SiT模型的无引导FID从8.26显著提升至3.63,像素空间JiT模型从7.17提升至3.57。该方法还改进了有引导生成,使SiT模型的FID从2.06降至1.53,JiT模型从1.86降至1.80。我们进一步在文生图任务上评估了该方法,其在GenEval和DPG基准测试中均取得了改进效果。