COLE: A Hierarchical Generation Framework for Graphic Design

Graphic design, which has been evolving since the 15th century, plays a crucial role in advertising. The creation of high-quality designs demands creativity, innovation, and lateral thinking. This intricate task involves understanding the objective, crafting visual elements such as the background, decoration, font, color, and shape, formulating diverse professional layouts, and adhering to fundamental visual design principles. In this paper, we introduce COLE, a hierarchical generation framework designed to comprehensively address these challenges. This COLE system can transform a straightforward intention prompt into a high-quality graphic design, while also supporting flexible editing based on user input. Examples of such input might include directives like ``design a poster for Hisaishi's concert.'' The key insight is to dissect the complex task of text-to-design generation into a hierarchy of simpler sub-tasks, each addressed by specialized models working collaboratively. The results from these models are then consolidated to produce a cohesive final output. Our hierarchical task decomposition can streamline the complex process and significantly enhance generation reliability. Our COLE system consists of multiple fine-tuned Large Language Models (LLMs), Large Multimodal Models (LMMs), and Diffusion Models (DMs), each specifically tailored for a design-aware text or image generation task. Furthermore, we construct the DESIGNERINTENTION benchmark to highlight the superiority of our COLE over existing methods in generating high-quality graphic designs from user intent. We perceive our COLE as an important step towards addressing more complex visual design generation tasks in the future.

翻译：图形设计自15世纪以来不断发展，在广告领域扮演着至关重要的角色。高质量设计的创作需要创造力、创新性和横向思维。这一复杂任务涉及理解目标、构思视觉元素（如背景、装饰、字体、色彩和形状）、制定多样化的专业布局，并遵循基础视觉设计原则。本文提出COLE，一种旨在全面应对这些挑战的分层生成框架。COLE系统能将简单的意图提示转化为高质量的图形设计，同时支持基于用户输入的灵活编辑，例如“为久石让的音乐会设计一张海报”这类指令。其核心思想是将文本到设计的复杂任务分解为一系列更简单的子任务层次结构，每个子任务由专门的协作模型处理，最终将各模型输出整合为连贯的最终结果。我们的分层任务分解能够简化复杂流程，显著提升生成可靠性。COLE系统包含多个经过微调的大语言模型（LLM）、大型多模态模型（LMM）和扩散模型（DM），每个模型均针对特定设计感知的文本或图像生成任务量身定制。此外，我们构建了DESIGNERINTENTION基准测试集，以突出COLE在从用户意图生成高质量图形设计方面优于现有方法。我们视COLE为未来应对更复杂视觉设计生成任务的重要一步。