LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation

Conditional layout generation aims to automatically generate visually appealing and semantically coherent layouts from user-defined constraints. While recent methods based on generative models have shown promising results, they typically require substantial amounts of training data or extensive fine-tuning, limiting their versatility and practical applicability. Alternatively, some training-free approaches leveraging in-context learning with Large Language Models (LLMs) have emerged, but they often suffer from limited reasoning capabilities and overly simplistic ranking mechanisms, which restrict their ability to generate consistently high-quality layouts. To this end, we propose LayoutCoT, a novel approach that leverages the reasoning capabilities of LLMs through a combination of Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) techniques. Specifically, LayoutCoT transforms layout representations into a standardized serialized format suitable for processing by LLMs. A Layout-aware RAG is used to facilitate effective retrieval and generate a coarse layout by LLMs. This preliminary layout, together with the selected exemplars, is then fed into a specially designed CoT reasoning module for iterative refinement, significantly enhancing both semantic coherence and visual quality. We conduct extensive experiments on five public datasets spanning three conditional layout generation tasks. Experimental results demonstrate that LayoutCoT achieves state-of-the-art performance without requiring training or fine-tuning. Notably, our CoT reasoning module enables standard LLMs, even those without explicit deep reasoning abilities, to outperform specialized deep-reasoning models such as deepseek-R1, highlighting the potential of our approach in unleashing the deep reasoning capabilities of LLMs for layout generation tasks.

翻译：条件布局生成旨在根据用户定义的约束自动生成视觉吸引力强且语义连贯的布局。尽管近期基于生成模型的方法已展现出有前景的结果，但它们通常需要大量训练数据或广泛的微调，这限制了其通用性和实际适用性。另一方面，一些利用大型语言模型上下文学习的免训练方法已经出现，但它们往往受限于推理能力不足和过于简化的排序机制，这制约了其生成持续高质量布局的能力。为此，我们提出了LayoutCoT，一种通过结合检索增强生成与思维链技术来利用大型语言模型推理能力的新方法。具体而言，LayoutCoT将布局表示转换为适合大型语言模型处理的标准化序列化格式。我们使用布局感知的检索增强生成来促进有效检索，并由大型语言模型生成粗略布局。该初步布局与选定的示例随后被输入到一个专门设计的思维链推理模块中进行迭代优化，从而显著提升了语义连贯性和视觉质量。我们在涵盖三个条件布局生成任务的五个公共数据集上进行了广泛实验。实验结果表明，LayoutCoT无需训练或微调即可实现最先进的性能。值得注意的是，我们的思维链推理模块使得标准大型语言模型（即使是那些不具备显式深度推理能力的模型）能够超越专门的深度推理模型（如deepseek-R1），这凸显了我们的方法在释放大型语言模型用于布局生成任务的深度推理能力方面的潜力。