In this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. Different from most existing 3D GANs that limit their applicability to aligned single objects, we focus on generating complex scenes with multiple objects, by modeling the compositional nature of 3D scenes. By devising a 2D layout-based approach for 3D synthesis and implementing a new 3D field representation with a stronger geometric inductive bias, we have created a 3D GAN that is both efficient and of high quality, while allowing for a more controllable generation process. Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality in comparison to previous works.
翻译:本文提出CC3D,一种基于二维语义场景布局条件、仅使用单视角图像训练的三维场景条件生成模型。与多数现有局限于对齐单物体的三维生成对抗网络不同,我们聚焦于通过建模三维场景的组合特性,生成包含多物体的复杂场景。通过设计基于二维布局的三维合成方法,并实现具有更强几何归纳偏置的新型三维场表示,我们构建了兼具高效性与高质量、且生成过程更具可控性的三维生成对抗网络。在合成数据集3D-FRONT与真实场景数据集KITTI-360上的评估表明,相较于以往工作,本模型生成的场景在视觉与几何质量上均有显著提升。