SIGMA: Selective-Interleaved Generation with Multi-Attribute Tokens

Recent unified models such as Bagel demonstrate that paired image-edit data can effectively align multiple visual tasks within a single diffusion transformer. However, these models remain limited to single-condition inputs and lack the flexibility needed to synthesize results from multiple heterogeneous sources. We present SIGMA (Selective-Interleaved Generation with Multi-Attribute Tokens), a unified post-training framework that enables interleaved multi-condition generation within diffusion transformers. SIGMA introduces selective multi-attribute tokens, including style, content, subject, and identity tokens, which allow the model to interpret and compose multiple visual conditions in an interleaved text-image sequence. Through post-training on the Bagel unified backbone with 700K interleaved examples, SIGMA supports compositional editing, selective attribute transfer, and fine-grained multimodal alignment. Extensive experiments show that SIGMA improves controllability, cross-condition consistency, and visual quality across diverse editing and generation tasks, with substantial gains over Bagel on compositional tasks.

翻译：近期如Bagel等统一模型表明，配对图像编辑数据能有效将多种视觉任务对齐至单个扩散Transformer中。然而，这些模型仍局限于单条件输入，缺乏从多个异构源合成结果的灵活性。我们提出SIGMA（基于多属性标记的选择性交错生成），这是一种统一的微调后训练框架，可在扩散Transformer中实现交错多条件生成。SIGMA引入了选择性多属性标记，包括风格、内容、主体和身份标记，使模型能够解析并组合交错图文序列中的多种视觉条件。通过在Bagel统一骨干网络上使用70万个交错样本进行微调后训练，SIGMA支持组合编辑、选择性属性迁移和细粒度多模态对齐。大量实验表明，SIGMA在多样化编辑与生成任务中显著提升了可控性、跨条件一致性和视觉质量，在组合任务上较Bagel实现了实质性提升。

相关内容

属性

关注 1

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

【CVPR2026】DiverseDiT: 迈向扩散 Transformer 中的多样化表示学习

专知会员服务

8+阅读 · 3月9日

多样化偏好优化

专知会员服务

12+阅读 · 2025年2月3日

Meta-Transformer：多模态学习的统一框架

专知会员服务

59+阅读 · 2023年7月21日

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

专知会员服务

25+阅读 · 2022年7月8日