Text-to-image generation has recently witnessed remarkable achievements. We introduce a text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images, which accurately portray the text prompts, encompassing multiple nouns, adjectives, and verbs. This is achieved by stacking tens of mixture-of-experts (MoEs) layers, i.e., space-MoE and time-MoE layers, enabling billions of diffusion paths (routes) from the network input to the output. Each path intuitively functions as a "painter" for depicting a particular textual concept onto a specified image region at a diffusion timestep. Comprehensive experiments reveal that RAPHAEL outperforms recent cutting-edge models, such as Stable Diffusion, ERNIE-ViLG 2.0, DeepFloyd, and DALL-E 2, in terms of both image quality and aesthetic appeal. Firstly, RAPHAEL exhibits superior performance in switching images across diverse styles, such as Japanese comics, realism, cyberpunk, and ink illustration. Secondly, a single model with three billion parameters, trained on 1,000 A100 GPUs for two months, achieves a state-of-the-art zero-shot FID score of 6.61 on the COCO dataset. Furthermore, RAPHAEL significantly surpasses its counterparts in human evaluation on the ViLG-300 benchmark. We believe that RAPHAEL holds the potential to propel the frontiers of image generation research in both academia and industry, paving the way for future breakthroughs in this rapidly evolving field. More details can be found on a project webpage: https://raphael-painter.github.io/.
翻译:文本到图像生成近期取得了显著进展。我们提出了一种名为RAPHAEL的文本条件图像扩散模型,旨在生成高度艺术性的图像,能够精准刻画包含多个名词、形容词和动词的文本提示。该模型通过堆叠数十个混合专家(MoEs)层(即空间MoE层和时间MoE层),实现了从网络输入到输出的数十亿条扩散路径(路由)。每条路径直观上充当一名“画家”,在特定扩散时间步将特定文本概念描绘到指定图像区域。综合实验表明,RAPHAEL在图像质量和美学吸引力方面均优于当前最先进的模型(如Stable Diffusion、ERNIE-ViLG 2.0、DeepFloyd和DALL-E 2)。首先,RAPHAEL在多种风格(如日式漫画、写实、赛博朋克和泼墨插画)的图像切换中展现出卓越性能。其次,该模型采用30亿参数单模型,在1,000块A100 GPU上训练两个月,在COCO数据集上实现了6.61的零样本FID顶尖分数。此外,在ViLG-300基准测试的人工评估中,RAPHAEL显著超越同类模型。我们相信RAPHAEL有潜力推动学术界和工业界图像生成研究的前沿,为这一快速发展领域未来的突破铺平道路。更多详情请访问项目网页:https://raphael-painter.github.io/。