Spatial control is a core capability in controllable image generation. Advancements in layout-guided image generation have shown promising results on in-distribution (ID) datasets with similar spatial configurations. However, it is unclear how these models perform when facing out-of-distribution (OOD) samples with arbitrary, unseen layouts. In this paper, we propose LayoutBench, a diagnostic benchmark for layout-guided image generation that examines four categories of spatial control skills: number, position, size, and shape. We benchmark two recent representative layout-guided image generation methods and observe that the good ID layout control may not generalize well to arbitrary layouts in the wild (e.g., objects at the boundary). Next, we propose IterInpaint, a new baseline that generates foreground and background regions in a step-by-step manner via inpainting, demonstrating stronger generalizability than existing models on OOD layouts in LayoutBench. We perform quantitative and qualitative evaluation and fine-grained analysis on the four LayoutBench skills to pinpoint the weaknesses of existing models. Lastly, we show comprehensive ablation studies on IterInpaint, including training task ratio, crop&paste vs. repaint, and generation order. Project website: https://layoutbench.github.io
翻译:空间控制是可控图像生成的核心能力。布局引导图像生成的最新进展已在具有相似空间配置的分布内数据集上展现出令人瞩目的成果。然而,当模型面对包含任意未见布局的分布外样本时,其性能表现尚不明朗。本文提出LayoutBench——一个专为布局引导图像生成设计的诊断基准,系统考察四类空间控制能力:数量、位置、尺寸与形状。我们针对两种近期代表性布局引导图像生成方法进行基准测试,发现表现良好的分布内布局控制能力难以泛化至实际场景中的任意布局(例如边界处物体)。在此基础上,我们提出IterInpaint新型基线方法,通过逐阶段修复方式生成前景与背景区域,在LayoutBench的分布外布局上展现出较现有模型更强的泛化能力。我们通过定量与定性评估,结合针对LayoutBench四项技能的细粒度分析,精准定位现有模型的薄弱环节。最后,我们对IterInpaint进行全面的消融研究,涵盖训练任务比例、裁剪粘贴与重绘对比、以及生成顺序等维度。项目网站:https://layoutbench.github.io