Semantic segmentation and semantic image synthesis are two representative tasks in visual perception and generation. While existing methods consider them as two distinct tasks, we propose a unified diffusion-based framework (SemFlow) and model them as a pair of reverse problems. Specifically, motivated by rectified flow theory, we train an ordinary differential equation (ODE) model to transport between the distributions of real images and semantic masks. As the training object is symmetric, samples belonging to the two distributions, images and semantic masks, can be effortlessly transferred reversibly. For semantic segmentation, our approach solves the contradiction between the randomness of diffusion outputs and the uniqueness of segmentation results. For image synthesis, we propose a finite perturbation approach to enhance the diversity of generated results without changing the semantic categories. Experiments show that our SemFlow achieves competitive results on semantic segmentation and semantic image synthesis tasks. We hope this simple framework will motivate people to rethink the unification of low-level and high-level vision. Project page: https://github.com/wang-chaoyang/SemFlow.
翻译:语义分割与语义图像合成是视觉感知与生成领域的两项代表性任务。现有方法通常将二者视为独立任务,本文提出一个统一的基于扩散的框架(SemFlow),并将其建模为一对反向问题。具体而言,受修正流理论启发,我们训练一个常微分方程(ODE)模型,以实现在真实图像与语义掩码分布之间的传输。由于训练目标具有对称性,属于图像和语义掩码这两个分布的样本能够轻松实现可逆转换。对于语义分割任务,本方法解决了扩散输出的随机性与分割结果唯一性之间的矛盾。对于图像合成任务,我们提出有限扰动方法,在不改变语义类别的前提下增强生成结果的多样性。实验表明,我们的SemFlow在语义分割和语义图像合成任务上均取得了具有竞争力的结果。我们希望这一简洁框架能够启发研究者重新思考低层与高层视觉任务的统一性。项目页面:https://github.com/wang-chaoyang/SemFlow。