Controllable generation is considered a potentially vital approach to address the challenge of annotating 3D data, and the precision of such controllable generation becomes particularly imperative in the context of data production for autonomous driving. Existing methods focus on the integration of diverse generative information into controlling inputs, utilizing frameworks such as GLIGEN or ControlNet, to produce commendable outcomes in controllable generation. However, such approaches intrinsically restrict generation performance to the learning capacities of predefined network architectures. In this paper, we explore the integration of controlling information and introduce PerlDiff (Perspective-Layout Diffusion Models), a method for effective street view image generation that fully leverages perspective 3D geometric information. Our PerlDiff employs 3D geometric priors to guide the generation of street view images with precise object-level control within the network learning process, resulting in a more robust and controllable output. Moreover, it demonstrates superior controllability compared to alternative layout control methods. Empirical results justify that our PerlDiff markedly enhances the precision of generation on the NuScenes and KITTI datasets. Our codes and models are publicly available at https://github.com/LabShuHangGU/PerlDiff.
翻译:可控生成被认为是解决三维数据标注挑战的潜在重要途径,而在自动驾驶数据生产的背景下,此类可控生成的精度变得尤为关键。现有方法侧重于将多样化的生成信息整合到控制输入中,利用GLIGEN或ControlNet等框架,在可控生成中取得了值得称赞的成果。然而,这类方法本质上将生成性能限制在预定义网络架构的学习能力之内。本文探讨了控制信息的整合,并提出了PerlDiff(透视布局扩散模型),这是一种充分利用透视三维几何信息的有效街景图像生成方法。我们的PerlDiff采用三维几何先验,在网络学习过程中引导生成具有精确物体级控制的街景图像,从而产生更鲁棒和可控的输出。此外,与其它布局控制方法相比,它展现出更优越的可控性。实证结果表明,我们的PerlDiff在NuScenes和KITTI数据集上显著提升了生成精度。我们的代码和模型已在https://github.com/LabShuHangGU/PerlDiff 公开。