Flow models parameterized as time-dependent velocity fields can generate data from noise by integrating an ODE. These models are often trained using flow matching, i.e. by sampling random pairs of noise and target points $(\mathbf{x}_0,\mathbf{x}_1)$ and ensuring that the velocity field is aligned, on average, with $\mathbf{x}_1-\mathbf{x}_0$ when evaluated along a segment linking $\mathbf{x}_0$ to $\mathbf{x}_1$. While these pairs are sampled independently by default, they can also be selected more carefully by matching batches of $n$ noise to $n$ target points using an optimal transport (OT) solver. Although promising in theory, the OT flow matching (OT-FM) approach is not widely used in practice. Zhang et al. (2025) pointed out recently that OT-FM truly starts paying off when the batch size $n$ grows significantly, which only a multi-GPU implementation of the Sinkhorn algorithm can handle. Unfortunately, the costs of running Sinkhorn can quickly balloon, requiring $O(n^2/\varepsilon^2)$ operations for every $n$ pairs used to fit the velocity field, where $\varepsilon$ is a regularization parameter that should be typically small to yield better results. To fulfill the theoretical promises of OT-FM, we propose to move away from batch-OT and rely instead on a semidiscrete formulation that leverages the fact that the target dataset distribution is usually of finite size $N$. The SD-OT problem is solved by estimating a dual potential vector using SGD; using that vector, freshly sampled noise vectors at train time can then be matched with data points at the cost of a maximum inner product search (MIPS). Semidiscrete FM (SD-FM) removes the quadratic dependency on $n/\varepsilon$ that bottlenecks OT-FM. SD-FM beats both FM and OT-FM on all training metrics and inference budget constraints, across multiple datasets, on unconditional/conditional generation, or when using mean-flow models.
翻译:流模型被参数化为时间相关的速度场,可通过积分常微分方程从噪声生成数据。这些模型通常通过流匹配进行训练,即通过随机采样噪声与目标点对 $(\mathbf{x}_0,\mathbf{x}_1)$,并确保速度场在连接 $\mathbf{x}_0$ 与 $\mathbf{x}_1$ 的线段上评估时,在平均意义上与 $\mathbf{x}_1-\mathbf{x}_0$ 对齐。虽然默认情况下这些点对是独立采样的,但也可以通过使用最优传输求解器将 $n$ 个噪声点与 $n$ 个目标点进行批量匹配来更谨慎地选择它们。尽管在理论上具有前景,OT流匹配方法在实践中并未被广泛采用。Zhang等人(2025)最近指出,只有当批量大小 $n$ 显著增长时,OT-FM才能真正开始显现优势,而这只有多GPU实现的Sinkhorn算法能够处理。遗憾的是,运行Sinkhorn的成本可能迅速膨胀,对于用于拟合速度场的每 $n$ 对点,需要 $O(n^2/\varepsilon^2)$ 次运算,其中 $\varepsilon$ 是一个正则化参数,通常需要设置得较小以获得更好的结果。为了实现OT-FM的理论潜力,我们提出摒弃批量OT方法,转而采用一种半离散公式,该公式利用了目标数据集分布通常具有有限大小 $N$ 这一事实。SD-OT问题通过使用随机梯度下降估计一个对偶势向量来解决;利用该向量,在训练时新采样的噪声向量可以与数据点进行匹配,其成本相当于一次最大内积搜索。半离散流匹配消除了OT-FM中受限于 $n/\varepsilon$ 的二次依赖关系。在多个数据集上,无论是无条件/条件生成,还是使用均值流模型,SD-FM在所有训练指标和推理预算约束下均优于FM和OT-FM。