We introduce a parameter-efficient adaptation method for panel-aware in-context image generation with pre-trained diffusion transformers. The key idea is to compose learnable, panel-specific orthogonal operators onto the backbone's frozen positional encodings. This design provides two desirable properties: (1) isometry, which preserves the geometry of internal features, and (2) same-panel invariance, which maintains the model's pre-trained intra-panel synthesis behavior. Through controlled experiments, we demonstrate that the effectiveness of our adaptation method is not tied to a specific positional encoding design but generalizes across diverse positional encoding regimes. By enabling effective panel-relative conditioning, the proposed method consistently improves in-context image-based instructional editing pipelines, including state-of-the-art approaches.
翻译:我们提出了一种参数高效的适配方法,用于预训练扩散变换器的面板感知上下文图像生成。核心思想是将可学习的、面板特定的正交算子组合到主干网络的冻结位置编码上。该设计提供了两个理想特性:(1) 等距性,保留内部特征的几何结构;(2) 同面板不变性,维持模型预训练的面板内合成行为。通过控制实验,我们证明该适配方法的有效性并非依赖于特定的位置编码设计,而是能推广至多种位置编码机制。通过实现有效的面板相对条件控制,所提出的方法持续改进了基于上下文图像的指令编辑流程,包括当前最先进的方法。