We present theoretical convergence guarantees for ODE-based generative models, specifically flow matching. We use a pre-trained autoencoder network to map high-dimensional original inputs to a low-dimensional latent space, where a transformer network is trained to predict the velocity field of the transformation from a standard normal distribution to the target latent distribution. Our error analysis demonstrates the effectiveness of this approach, showing that the distribution of samples generated via estimated ODE flow converges to the target distribution in the Wasserstein-2 distance under mild and practical assumptions. Furthermore, we show that arbitrary smooth functions can be effectively approximated by transformer networks with Lipschitz continuity, which may be of independent interest.
翻译:我们针对基于常微分方程的生成模型(特别是流匹配)给出了理论收敛保证。采用预训练的自编码器网络将高维原始输入映射至低维潜在空间,在该空间中训练Transformer网络预测从标准正态分布到目标潜在分布的变换速度场。误差分析表明该方法具有有效性:在温和且实际可操作的假设条件下,通过估计的常微分方程流生成的样本分布将在Wasserstein-2距离下收敛于目标分布。此外,我们证明了任意光滑函数可被具有Lipschitz连续性的Transformer网络有效逼近,这一结论可能具有独立的研究价值。