Recent advancements in Generative Artificial Intelligence (GenAI) have significantly enhanced the capabilities of both image generation and editing. However, current approaches often treat these tasks separately, leading to inefficiencies and challenges in maintaining spatial consistency and semantic coherence between generated content and edits. Moreover, a major obstacle is the lack of structured control over object relationships and spatial arrangements. Scene graph-based methods, which represent objects and their interrelationships in a structured format, offer a solution by providing greater control over composition and interactions in both image generation and editing. To address this, we introduce SimGraph, a unified framework that integrates scene graph-based image generation and editing, enabling precise control over object interactions, layouts, and spatial coherence. In particular, our framework integrates token-based generation and diffusion-based editing within a single scene graph-driven model, ensuring high-quality and consistent results. Through extensive experiments, we empirically demonstrate that our approach outperforms existing state-of-the-art methods.
翻译:生成式人工智能(GenAI)的最新进展显著提升了图像生成与编辑的能力。然而,现有方法通常将这两项任务分开处理,导致在生成内容与编辑之间保持空间一致性和语义连贯性方面存在效率低下和挑战。此外,一个主要障碍是缺乏对物体关系和空间布局的结构化控制。基于场景图的方法以结构化格式表示物体及其相互关系,通过提供对图像生成与编辑中构图和交互的更强控制,为解决这一问题提供了方案。为此,我们提出了SimGraph,一个统一框架,集成了基于场景图的图像生成与编辑,能够精确控制物体交互、布局和空间一致性。具体而言,我们的框架在单一场景图驱动模型中整合了基于令牌的生成和基于扩散的编辑,确保了高质量且一致的结果。通过大量实验,我们实证证明了该方法优于现有的最先进方法。