In spite of recent advancements in text-to-image generation, limitations persist in handling complex and imaginative prompts due to the restricted diversity and complexity of training data. This work explores how diffusion models can generate images from prompts requiring artistic creativity or specialized knowledge. We introduce the Realistic-Fantasy Benchmark (RFBench), a novel evaluation framework blending realistic and fantastical scenarios. To address these challenges, we propose the Realistic-Fantasy Network (RFNet), a training-free approach integrating diffusion models with LLMs. Extensive human evaluations and GPT-based compositional assessments demonstrate our approach's superiority over state-of-the-art methods. Our code and dataset is available at https://leo81005.github.io/Reality-and-Fantasy/.
翻译:尽管文本到图像生成技术近期取得了进展,但由于训练数据多样性和复杂性的限制,处理复杂且富有想象力的提示仍存在局限。本研究探索了扩散模型如何根据需要艺术创造力或专业知识的提示生成图像。我们提出了现实-幻想基准(RFBench),这是一个融合现实与奇幻场景的新型评估框架。为应对这些挑战,我们提出了现实-幻想网络(RFNet),这是一种将扩散模型与LLM相结合的无训练方法。大量人工评估和基于GPT的组合评估证明了我们的方法相较于最先进技术的优越性。我们的代码和数据集可在https://leo81005.github.io/Reality-and-Fantasy/获取。