Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic layout generation, leveraging the multi-modal large language model (MLLM) to accommodate diverse design tasks. In contrast, our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts under specific visual and textual constraints, including user-defined natural language specifications. We conducted extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks, demonstrating the effectiveness of our method. Moreover, recognizing existing datasets' limitations in capturing the complexity of real-world graphic designs, we propose two new datasets for much more challenging tasks (user-constrained generation and complicated poster), further validating our model's utility in real-life settings. Marking by its superior accessibility and adaptability, this approach further automates large-scale graphic design tasks. The code and datasets will be publicly available on https://github.com/posterllava/PosterLLaVA.
翻译:布局生成是实现自动化平面设计的关键,其要求以视觉美观且遵循约束的方式安排多种多模态设计元素的位置与尺寸。现有方法在大规模应用中效率低下,或缺乏应对多样化设计需求的灵活性。本研究引入了一个统一的自动化平面布局生成框架,利用多模态大语言模型以适应不同的设计任务。与以往方法不同,我们采用数据驱动策略,通过结构化文本(JSON格式)与视觉指令微调,在特定视觉与文本约束(包括用户定义的自然语言规范)下生成布局。我们进行了大量实验,在公开的多模态布局生成基准测试中取得了最先进的性能,证明了本方法的有效性。此外,针对现有数据集在捕捉真实世界平面设计复杂性方面的不足,我们提出了两个面向更高挑战性任务(用户约束生成与复杂海报设计)的新数据集,进一步验证了模型在实际场景中的实用性。该方法以其卓越的易用性与适应性,进一步推动了大规模平面设计任务的自动化进程。代码与数据集将公开于 https://github.com/posterllava/PosterLLaVA。