Flow matching is a recent framework to train generative models that exhibits impressive empirical performance while being relatively easier to train compared with diffusion-based models. Despite its advantageous properties, prior methods still face the challenges of expensive computing and a large number of function evaluations of off-the-shelf solvers in the pixel space. Furthermore, although latent-based generative methods have shown great success in recent years, this particular model type remains underexplored in this area. In this work, we propose to apply flow matching in the latent spaces of pretrained autoencoders, which offers improved computational efficiency and scalability for high-resolution image synthesis. This enables flow-matching training on constrained computational resources while maintaining their quality and flexibility. Additionally, our work stands as a pioneering contribution in the integration of various conditions into flow matching for conditional generation tasks, including label-conditioned image generation, image inpainting, and semantic-to-image generation. Through extensive experiments, our approach demonstrates its effectiveness in both quantitative and qualitative results on various datasets, such as CelebA-HQ, FFHQ, LSUN Church & Bedroom, and ImageNet. We also provide a theoretical control of the Wasserstein-2 distance between the reconstructed latent flow distribution and true data distribution, showing it is upper-bounded by the latent flow matching objective. Our code will be available at https://github.com/VinAIResearch/LFM.git.
翻译:流形匹配是近期提出的一种生成模型训练框架,在保持相对扩散模型更易训练特性的同时,展现出令人印象深刻的实证表现。尽管具有这些优势,现有方法在像素空间中仍面临计算成本高昂及现成求解器需大量函数求值等挑战。此外,尽管基于潜变量的生成方法近年来取得了巨大成功,但该特定模型类型在此领域尚未得到充分探索。本工作中,我们提出在预训练自编码器的潜空间应用流形匹配,为高分辨率图像合成提供更优的计算效率与可扩展性。这使得在有限计算资源下进行流形匹配训练成为可能,同时保持生成质量与灵活性。在此基础上,我们的工作开创性地将多种条件引入流形匹配以完成条件生成任务,包括标签条件图像生成、图像修复及语义到图像生成。通过大规模实验,我们的方法在CelebA-HQ、FFHQ、LSUN Church & Bedroom及ImageNet等多个数据集上的定量与定性结果均证明了其有效性。我们还从理论上控制了重建潜流分布与真实数据分布之间的Wasserstein-2距离,证明其被潜流形匹配目标函数上界所约束。相关代码将发布于https://github.com/VinAIResearch/LFM.git。