Building Normalizing Flows with Stochastic Interpolants

A generative model based on a continuous-time normalizing flow between any pair of base and target probability densities is proposed. The velocity field of this flow is inferred from the probability current of a time-dependent density that interpolates between the base and the target in finite time. Unlike conventional normalizing flow inference methods based the maximum likelihood principle, which require costly backpropagation through ODE solvers, our interpolant approach leads to a simple quadratic loss for the velocity itself which is expressed in terms of expectations that are readily amenable to empirical estimation. The flow can be used to generate samples from either the base or target, and to estimate the likelihood at any time along the interpolant. In addition, the flow can be optimized to minimize the path length of the interpolant density, thereby paving the way for building optimal transport maps. In situations where the base is a Gaussian density, we also show that the velocity of our normalizing flow can also be used to construct a diffusion model to sample the target as well as estimate its score. However, our approach shows that we can bypass this diffusion completely and work at the level of the probability flow with greater simplicity, opening an avenue for methods based solely on ordinary differential equations as an alternative to those based on stochastic differential equations. Benchmarking on density estimation tasks illustrates that the learned flow can match and surpass conventional continuous flows at a fraction of the cost, and compares well with diffusions on image generation on CIFAR-10 and ImageNet $32\times32$. The method scales ab-initio ODE flows to previously unreachable image resolutions, demonstrated up to $128\times128$.

翻译：提出了一种基于连续时间正则化流的生成模型，该模型可在任意基础概率密度与目标概率密度之间进行转换。该流的速度场通过插值基密度与目标密度的时变概率密度的概率流推断得出。与基于最大似然原理的传统正则化流推断方法（需通过常微分方程求解器进行高成本反向传播）不同，我们的插值方法为速度场本身构建了简单的二次损失函数，该损失以期望形式表达，易于进行经验估计。该流可用于从基础密度或目标密度生成样本，并沿插值路径任意时刻估计似然。此外，该流可优化以最小化插值密度的路径长度，从而为构建最优传输映射铺平道路。当基础密度为高斯分布时，我们进一步证明正则化流的速度场可构建扩散模型以采样目标密度并估计其分数函数。然而，我们的方法表明可完全绕过扩散过程，直接在概率流层面以更高简洁性工作，开辟了基于常微分方程的替代随机微分方程方法的新途径。在密度估计任务上的基准测试表明，学习得到的流能以极小成本匹配甚至超越传统连续流，并在CIFAR-10和ImageNet $32\times32$图像生成任务中与扩散模型表现相当。该方法将从头计算的常微分方程流扩展至此前无法达到的图像分辨率，实验验证可达$128\times128$。