Flow Matching has emerged as a powerful framework for learning continuous transformations between distributions, enabling high-fidelity generative modeling. This work introduces Symmetrical Flow Matching (SymmFlow), a new formulation that unifies semantic segmentation, classification, and image generation within a single model. Using a symmetric learning objective, SymmFlow models forward and reverse transformations jointly, ensuring bi-directional consistency, while preserving sufficient entropy for generative diversity. A new training objective is introduced to explicitly retain semantic information across flows, featuring efficient sampling while preserving semantic structure, allowing for one-step segmentation and classification without iterative refinement. Unlike previous approaches that impose strict one-to-one mapping between masks and images, SymmFlow generalizes to flexible conditioning, supporting both pixel-level and image-level class labels. Experimental results on various benchmarks demonstrate that SymmFlow achieves state-of-the-art performance on semantic image synthesis, obtaining FID scores of 11.9 on CelebAMask-HQ and 7.0 on COCO-Stuff with only 25 inference steps. Additionally, it delivers competitive results on semantic segmentation and shows promising capabilities in classification tasks.
翻译:流匹配已成为学习分布间连续变换的强大框架,能够实现高保真度的生成建模。本文提出对称流匹配(SymmFlow),这是一种将语义分割、分类和图像生成统一于单一模型的新范式。通过对称学习目标,SymmFlow联合建模正向与反向变换,确保双向一致性的同时保留足够的熵以维持生成多样性。我们引入新的训练目标以显式保留跨流语义信息,在保持语义结构的前提下实现高效采样,从而支持无需迭代优化的单步分割与分类。与先前强制掩码与图像间严格一对一映射的方法不同,SymmFlow推广至灵活的条件生成范式,同时支持像素级和图像级类别标签。在多个基准测试上的实验结果表明,SymmFlow在语义图像合成任务中达到最先进性能,仅需25步推理即在CelebAMask-HQ数据集获得11.9的FID分数,在COCO-Stuff数据集获得7.0的FID分数。此外,该方法在语义分割任务中取得具有竞争力的结果,并在分类任务中展现出有前景的能力。