Rectified flow models have emerged as a dominant approach in image generation, showcasing impressive capabilities in high-quality image synthesis. However, despite their effectiveness in visual generation, rectified flow models often struggle with disentangled editing of images. This limitation prevents the ability to perform precise, attribute-specific modifications without affecting unrelated aspects of the image. In this paper, we introduce FluxSpace, a domain-agnostic image editing method leveraging a representation space with the ability to control the semantics of images generated by rectified flow transformers, such as Flux. By leveraging the representations learned by the transformer blocks within the rectified flow models, we propose a set of semantically interpretable representations that enable a wide range of image editing tasks, from fine-grained image editing to artistic creation. This work offers a scalable and effective image editing approach, along with its disentanglement capabilities.
翻译:整流流模型已成为图像生成领域的主导方法,在高质量图像合成方面展现出卓越能力。然而,尽管在视觉生成方面效果显著,整流流模型往往难以实现图像的解耦编辑。这一局限性阻碍了对图像进行精确、属性特异性修改而不影响无关方面的能力。本文提出FluxSpace,一种领域无关的图像编辑方法,利用能够控制整流流Transformer(如Flux)生成图像语义的表征空间。通过利用整流流模型中Transformer模块学习到的表征,我们提出一组语义可解释的表征,支持从细粒度图像编辑到艺术创作的广泛图像编辑任务。本工作提供了一种可扩展且高效的图像编辑方法,并具备解耦编辑能力。