Flow Matching (FM) is a simulation-free method for learning a continuous, invertible flow that interpolates between two distributions, and in particular generates data from noise. Inspired by the variational nature of the diffusion process as a gradient flow, we introduce a stepwise FM model, Local Flow Matching (LFM), which sequentially learns a sequence of FM submodels, each matching a diffusion process up to the time-step size in the data-to-noise direction. In each step, the two distributions to be interpolated by the sub-flow model are closer than those in the full-flow matching model, which interpolates data to noise distributions, enabling smaller models with more efficient training. This variational perspective also allows us to prove a theoretical generation guarantee for the proposed flow model in terms of the $χ^2$-divergence between the generated and true data distributions, leveraging the contraction property of the diffusion process. In practice, the stepwise structure of LFM is naturally amenable to model distillation, and various distillation techniques can be applied to accelerate generation. We empirically demonstrate that LFM achieves competitive generative performance compared to FM on unconditional generation of tabular and image datasets, and on conditional generation of robotic manipulation policies.
翻译:暂无翻译