Context Canvas: Enhancing Text-to-Image Diffusion Models with Knowledge Graph-Based RAG

We introduce a novel approach to enhance the capabilities of text-to-image models by incorporating a graph-based RAG. Our system dynamically retrieves detailed character information and relational data from the knowledge graph, enabling the generation of visually accurate and contextually rich images. This capability significantly improves upon the limitations of existing T2I models, which often struggle with the accurate depiction of complex or culturally specific subjects due to dataset constraints. Furthermore, we propose a novel self-correcting mechanism for text-to-image models to ensure consistency and fidelity in visual outputs, leveraging the rich context from the graph to guide corrections. Our qualitative and quantitative experiments demonstrate that Context Canvas significantly enhances the capabilities of popular models such as Flux, Stable Diffusion, and DALL-E, and improves the functionality of ControlNet for fine-grained image editing tasks. To our knowledge, Context Canvas represents the first application of graph-based RAG in enhancing T2I models, representing a significant advancement for producing high-fidelity, context-aware multi-faceted images.

翻译：我们提出了一种新颖方法，通过引入基于图谱的检索增强生成技术来增强文本到图像模型的能力。我们的系统能够动态地从知识图谱中检索详细的角色信息和关系数据，从而生成视觉精确且上下文丰富的图像。这一能力显著改善了现有文本到图像模型的局限性——由于训练数据集的约束，现有模型在准确描绘复杂或文化特定主题时常常面临困难。此外，我们提出了一种创新的文本到图像模型自校正机制，利用图谱提供的丰富上下文信息引导校正过程，确保视觉输出的一致性和保真度。我们的定性与定量实验表明，上下文画布显著增强了Flux、Stable Diffusion和DALL-E等主流模型的生成能力，并提升了ControlNet在细粒度图像编辑任务中的功能性。据我们所知，上下文画布代表了基于图谱的检索增强生成技术在增强文本到图像模型领域的首次应用，标志着在生成高保真度、上下文感知的多维度图像方面取得了重要进展。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日