We present theoretical convergence guarantees for ODE-based generative models, specifically flow matching. We use a pre-trained autoencoder network to map high-dimensional original inputs to a low-dimensional latent space, where a transformer network is trained to predict the velocity field of the transformation from a standard normal distribution to the target latent distribution. Our error analysis demonstrates the effectiveness of this approach, showing that the distribution of samples generated via estimated ODE flow converges to the target distribution in the Wasserstein-2 distance under mild and practical assumptions. Furthermore, we show that arbitrary smooth functions can be effectively approximated by transformer networks with Lipschitz continuity, which may be of independent interest.
翻译:我们提出了基于ODE生成模型的理论收敛保证,特别是流匹配方法。采用预训练自编码器网络将高维原始输入映射到低维潜空间,在该空间中训练Transformer网络以预测从标准正态分布到目标潜分布的变换速度场。误差分析证明了该方法的有效性,表明在温和且实际的假设下,通过估计ODE流生成的样本分布与目标分布在Wasserstein-2距离下收敛。此外,我们证明任意光滑函数均可通过具有Lipschitz连续性的Transformer网络有效逼近,这一结果可能具有独立研究价值。