PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic layout generation, leveraging the multi-modal large language model (MLLM) to accommodate diverse design tasks. In contrast, our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts under specific visual and textual constraints, including user-defined natural language specifications. We conducted extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks, demonstrating the effectiveness of our method. Moreover, recognizing existing datasets' limitations in capturing the complexity of real-world graphic designs, we propose two new datasets for much more challenging tasks (user-constrained generation and complicated poster), further validating our model's utility in real-life settings. Marking by its superior accessibility and adaptability, this approach further automates large-scale graphic design tasks. The code and datasets will be publicly available on https://github.com/posterllava/PosterLLaVA.

翻译：布局生成是实现自动化平面设计的关键，其要求以视觉美观且遵循约束的方式安排多种多模态设计元素的位置与尺寸。现有方法在大规模应用中效率低下，或缺乏应对多样化设计需求的灵活性。本研究引入了一个统一的自动化平面布局生成框架，利用多模态大语言模型以适应不同的设计任务。与以往方法不同，我们采用数据驱动策略，通过结构化文本（JSON格式）与视觉指令微调，在特定视觉与文本约束（包括用户定义的自然语言规范）下生成布局。我们进行了大量实验，在公开的多模态布局生成基准测试中取得了最先进的性能，证明了本方法的有效性。此外，针对现有数据集在捕捉真实世界平面设计复杂性方面的不足，我们提出了两个面向更高挑战性任务（用户约束生成与复杂海报设计）的新数据集，进一步验证了模型在实际场景中的实用性。该方法以其卓越的易用性与适应性，进一步推动了大规模平面设计任务的自动化进程。代码与数据集将公开于 https://github.com/posterllava/PosterLLaVA。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日