GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

Posters play a crucial role in marketing and advertising, contributing significantly to industrial design by enhancing visual communication and brand visibility. With recent advances in controllable text-to-image diffusion models, more concise research is now focusing on rendering text within synthetic images. Despite improvements in text rendering accuracy, the field of end-to-end poster generation remains underexplored. This complex task involves striking a balance between text rendering accuracy and automated layout to produce high-resolution images with variable aspect ratios. To tackle this challenge, we propose an end-to-end text rendering framework employing a triple cross-attention mechanism rooted in align learning, designed to create precise poster text within detailed contextual backgrounds. Additionally, we introduce a high-resolution dataset that exceeds 1024 pixels in image resolution. Our approach leverages the SDXL architecture. Extensive experiments validate the ability of our method to generate poster images featuring intricate and contextually rich backgrounds. Codes will be available at https://github.com/OPPO-Mente-Lab/GlyphDraw2.

翻译：海报在市场营销与广告领域发挥着至关重要的作用，通过增强视觉传达与品牌可见度，为工业设计做出了显著贡献。随着可控文本到图像扩散模型的最新进展，当前研究正更聚焦于在合成图像中渲染文本。尽管文本渲染准确性已有所提升，端到端海报生成领域仍处于探索不足的状态。这项复杂任务需要在文本渲染准确性与自动化布局之间取得平衡，以生成具有可变宽高比的高分辨率图像。为应对这一挑战，我们提出一种端到端文本渲染框架，该框架采用基于对齐学习的三重交叉注意力机制，旨在具有细节化上下文背景中生成精确的海报文本。此外，我们引入了一个图像分辨率超过1024像素的高分辨率数据集。我们的方法基于SDXL架构。大量实验验证了本方法能够生成具有复杂且上下文丰富背景的海报图像。代码将在 https://github.com/OPPO-Mente-Lab/GlyphDraw2 公开。

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日