Creating graphic layouts is a fundamental step in graphic designs. In this work, we present a novel generative model named LayoutDiffusion for automatic layout generation. As layout is typically represented as a sequence of discrete tokens, LayoutDiffusion models layout generation as a discrete denoising diffusion process. It learns to reverse a mild forward process, in which layouts become increasingly chaotic with the growth of forward steps and layouts in the neighboring steps do not differ too much. Designing such a mild forward process is however very challenging as layout has both categorical attributes and ordinal attributes. To tackle the challenge, we summarize three critical factors for achieving a mild forward process for the layout, i.e., legality, coordinate proximity and type disruption. Based on the factors, we propose a block-wise transition matrix coupled with a piece-wise linear noise schedule. Experiments on RICO and PubLayNet datasets show that LayoutDiffusion outperforms state-of-the-art approaches significantly. Moreover, it enables two conditional layout generation tasks in a plug-and-play manner without re-training and achieves better performance than existing methods.
翻译:创建图形布局是图形设计中的基本步骤。本文提出了一种名为LayoutDiffusion的新型生成模型,用于自动布局生成。由于布局通常表示为离散令牌序列,LayoutDiffusion将布局生成建模为离散去噪扩散过程。它学习逆转一个温和的前向过程,在此过程中,布局随着前向步骤的增加而变得愈发混乱,但相邻步骤中的布局差异不大。然而,设计这样一个温和的前向过程极具挑战性,因为布局同时具有分类属性和顺序属性。为应对这一挑战,我们总结了实现布局温和前向过程的三个关键因素,即合法性、坐标邻近性和类型破坏性。基于这些因素,我们提出了一种块状转移矩阵,并配合分段线性噪声调度。在RICO和PubLayNet数据集上的实验表明,LayoutDiffusion显著优于现有最先进的方法。此外,它能够以即插即用的方式实现两种条件布局生成任务,无需重新训练,且性能优于现有方法。