We introduce a novel approach to enhance the capabilities of text-to-image models by incorporating a graph-based RAG. Our system dynamically retrieves detailed character information and relational data from the knowledge graph, enabling the generation of visually accurate and contextually rich images. This capability significantly improves upon the limitations of existing T2I models, which often struggle with the accurate depiction of complex or culturally specific subjects due to dataset constraints. Furthermore, we propose a novel self-correcting mechanism for text-to-image models to ensure consistency and fidelity in visual outputs, leveraging the rich context from the graph to guide corrections. Our qualitative and quantitative experiments demonstrate that Context Canvas significantly enhances the capabilities of popular models such as Flux, Stable Diffusion, and DALL-E, and improves the functionality of ControlNet for fine-grained image editing tasks. To our knowledge, Context Canvas represents the first application of graph-based RAG in enhancing T2I models, representing a significant advancement for producing high-fidelity, context-aware multi-faceted images.
翻译:我们提出了一种新颖方法,通过引入基于图谱的检索增强生成技术来增强文本到图像模型的能力。我们的系统能够动态地从知识图谱中检索详细的角色信息和关系数据,从而生成视觉精确且上下文丰富的图像。这一能力显著改善了现有文本到图像模型的局限性——由于训练数据集的约束,现有模型在准确描绘复杂或文化特定主题时常常面临困难。此外,我们提出了一种创新的文本到图像模型自校正机制,利用图谱提供的丰富上下文信息引导校正过程,确保视觉输出的一致性和保真度。我们的定性与定量实验表明,上下文画布显著增强了Flux、Stable Diffusion和DALL-E等主流模型的生成能力,并提升了ControlNet在细粒度图像编辑任务中的功能性。据我们所知,上下文画布代表了基于图谱的检索增强生成技术在增强文本到图像模型领域的首次应用,标志着在生成高保真度、上下文感知的多维度图像方面取得了重要进展。