Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' multimodal desires, i.e., constrained by background images and driven by foreground content. We propose LayoutDETR that inherits the high quality and realism from generative modeling, while reformulating content-aware requirements as a detection problem: we learn to detect in a background image the reasonable locations, scales, and spatial relations for multimodal foreground elements in a layout. Our solution sets a new state-of-the-art performance for layout generation on public benchmarks and on our newly-curated ad banner dataset. We integrate our solution into a graphical system that facilitates user studies, and show that users prefer our designs over baselines by significant margins. Our code, models, dataset, graphical system, and demos are available at https://github.com/salesforce/LayoutDETR.
翻译:图形布局设计在视觉传达中起着至关重要的作用。然而,手工制作布局设计需要专业技能、耗时且无法规模化批量生产。生成式模型的出现使得设计自动化变得可扩展,但生成符合设计师多模态需求(即受背景图像约束并由前景内容驱动)的设计仍并非易事。我们提出的LayoutDETR继承了生成建模的高质量和逼真性,同时将内容感知需求重新表述为检测问题:我们学习在背景图像中检测多模态前景元素在布局中的合理位置、尺度和空间关系。我们的解决方案在公开基准测试以及我们新策划的广告横幅数据集上,为布局生成任务树立了新的最先进性能。我们将该解决方案集成到一个便于用户研究的图形系统中,并表明用户对设计方案的偏好显著优于基线。我们的代码、模型、数据集、图形系统及演示可在https://github.com/salesforce/LayoutDETR获取。