Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs are skill-demanding, time-consuming, and non-scalable to batch production. Although generative models emerge to make design automation no longer utopian, it remains non-trivial to customize designs that comply with designers' multimodal desires, i.e., constrained by background images and driven by foreground contents. In this study, we propose \textit{LayoutDETR} that inherits the high quality and realism from generative modeling, in the meanwhile reformulating content-aware requirements as a detection problem: we learn to detect in a background image the reasonable locations, scales, and spatial relations for multimodal elements in a layout. Experiments validate that our solution yields new state-of-the-art performance for layout generation on public benchmarks and on our newly-curated ads banner dataset. For practical usage, we build our solution into a graphical system that facilitates user studies. We demonstrate that our designs attract more subjective preferences than baselines by significant margins. Our code, models, dataset, graphical system, and demos are available at https://github.com/salesforce/LayoutDETR.
翻译:图形布局设计在视觉传达中扮演着至关重要的角色。然而,手工制作布局设计既需要专业技能,又耗费时间,且难以规模化批量生产。尽管生成式模型的出现使设计自动化不再遥不可及,但定制符合设计师多模态需求(即受背景图像约束并由前景内容驱动)的设计仍然具有挑战性。在本研究中,我们提出\textit{LayoutDETR},它继承了生成式模型的高质量与逼真性,同时将内容感知需求重新定义为检测问题:我们学习在背景图像中检测布局中多模态元素的合理位置、尺度及空间关系。实验证明,我们的方法在公开基准测试以及我们新整理的海报横幅数据集上,均取得了布局生成领域的最新最优性能。为便于实际应用,我们将该方案构建为一个图形系统,支持用户研究。我们证明,与基线方法相比,我们设计的方案在主观偏好上获得了显著性优势。我们的代码、模型、数据集、图形系统及演示程序均可通过https://github.com/salesforce/LayoutDETR获取。