Diffusion models enable high-quality and diverse visual content synthesis. However, they struggle to generate rare or unseen concepts. To address this challenge, we explore the usage of Retrieval-Augmented Generation (RAG) with image generation models. We propose ImageRAG, a method that dynamically retrieves relevant images based on a given text prompt, and uses them as context to guide the generation process. Prior approaches that used retrieved images to improve generation, trained models specifically for retrieval-based generation. In contrast, ImageRAG leverages the capabilities of existing image conditioning models, and does not require RAG-specific training. Our approach is highly adaptable and can be applied across different model types, showing significant improvement in generating rare and fine-grained concepts using different base models. Our project page is available at: https://rotem-shalev.github.io/ImageRAG
翻译:扩散模型能够实现高质量且多样化的视觉内容合成。然而,它们在生成罕见或未见过的概念时存在困难。为应对这一挑战,我们探索了将检索增强生成(RAG)与图像生成模型结合使用的方法。我们提出了ImageRAG,该方法能够根据给定的文本提示动态检索相关图像,并将其作为上下文来引导生成过程。先前利用检索图像改进生成的方法,专门训练了用于基于检索的生成模型。与之相反,ImageRAG利用了现有图像条件化模型的能力,无需进行RAG特定的训练。我们的方法具有高度适应性,可应用于不同类型的模型,在使用不同基础模型生成罕见和细粒度概念方面显示出显著改进。我们的项目页面位于:https://rotem-shalev.github.io/ImageRAG