Graph-PiT: Enhancing Structural Coherence in Part-Based Image Synthesis via Graph Priors

Achieving fine-grained and structurally sound controllability is a cornerstone of advanced visual generation. Existing part-based frameworks treat user-provided parts as an unordered set and therefore ignore their intrinsic spatial and semantic relationships, which often results in compositions that lack structural integrity. To bridge this gap, we propose Graph-PiT, a framework that explicitly models the structural dependencies of visual components using a graph prior. Specifically, we represent visual parts as nodes and their spatial-semantic relationships as edges. At the heart of our method is a Hierarchical Graph Neural Network (HGNN) module that performs bidirectional message passing between coarse-grained part-level super-nodes and fine-grained IP+ token sub-nodes, refining part embeddings before they enter the generative pipeline. We also introduce a graph Laplacian smoothness loss and an edge-reconstruction loss so that adjacent parts acquire compatible, relation-aware embeddings. Quantitative experiments on controlled synthetic domains (character, product, indoor layout, and jigsaw), together with qualitative transfer to real web images, show that Graph-PiT improves structural coherence over vanilla PiT while remaining compatible with the original IP-Prior pipeline. Ablation experiments confirm that explicit relational reasoning is crucial for enforcing user-specified adjacency constraints. Our approach not only enhances the plausibility of generated concepts but also offers a scalable and interpretable mechanism for complex, multi-part image synthesis. The code is available at https://github.com/wolf-bailang/Graph-PiT.

翻译：实现精细且结构合理的可控性是高级视觉生成的基石。现有基于部件的框架将用户提供的部件视为无序集合，因而忽略了其内在的空间与语义关系，这常导致生成结果缺乏结构完整性。为弥补这一缺陷，我们提出Graph-PiT框架，该框架利用图先验显式建模视觉组件的结构依赖关系。具体而言，我们将视觉部件表示为节点，将其空间-语义关系表示为边。方法的核心是层级图神经网络（HGNN）模块，该模块在粗粒度部件级超节点与细粒度IP+令牌子节点间执行双向消息传递，在部件嵌入进入生成流程前对其进行优化。我们还引入了图拉普拉斯平滑损失与边重建损失，使相邻部件获得兼容的、关系感知的嵌入。在受控合成域（字符、产品、室内布局与拼图）上的定量实验，以及对真实网络图像的定性迁移应用表明，Graph-PiT相较于原始PiT在保持与原始IP-Prior流程兼容性的同时，提升了结构连贯性。消融实验证实，显式关系推理对于强制执行用户指定的邻接约束至关重要。该方法不仅增强了生成概念的合理性，还为复杂的多部件图像合成提供了可扩展且可解释的机制。代码发布于https://github.com/wolf-bailang/Graph-PiT。