Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions. Although the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate coherent text within images, particularly for complex glyph structures like Chinese characters. To address this problem, we introduce GlyphDraw, a general learning framework aiming at endowing image generation models with the capacity to generate images embedded with coherent text. To the best of our knowledge, this is the first work in the field of image synthesis to address the generation of Chinese characters. % we first adopt the OCR technique to collect images with Chinese characters as training samples, and extract the text and locations as auxiliary information. We first sophisticatedly design the image-text dataset's construction strategy, then build our model specifically on a diffusion-based image generator and carefully modify the network structure to allow the model to learn drawing Chinese characters with the help of glyph and position information. Furthermore, we maintain the model's open-domain image synthesis capability by preventing catastrophic forgetting by using a variety of training techniques. Extensive qualitative and quantitative experiments demonstrate that our method not only produces accurate Chinese characters as in prompts, but also naturally blends the generated text into the background. Please refer to https://1073521013.github.io/glyph-draw.github.io
翻译:摘要:语言引导图像生成领域的最新突破取得了令人瞩目的成果,使得能够根据用户指令创建高质量且多样化的图像。尽管合成性能令人惊叹,但当前图像生成模型的一个显著局限是其在图像中生成连贯文本的能力不足,尤其是对于汉字等复杂字形结构。为解决这一问题,我们提出GlyphDraw,一个旨在赋予图像生成模型生成嵌入连贯文本图像能力的通用学习框架。据我们所知,这是图像合成领域首个解决汉字生成问题的工作。我们首先精心设计了图像-文本数据集的构建策略,随后基于扩散图像生成器构建模型,并仔细修改网络结构,使模型能够借助字形和位置信息学习绘制汉字。此外,我们通过多种训练技术防止灾难性遗忘,从而保持模型的开放域图像合成能力。大量定性和定量实验表明,我们的方法不仅能够生成与提示中一致的准确汉字,还能将生成的文本自然融入背景。请参阅 https://1073521013.github.io/glyph-draw.github.io