Generating realistic and diverse layouts of furnished indoor 3D scenes unlocks multiple interactive applications impacting a wide range of industries. The inherent complexity of object interactions, the limited amount of available data and the requirement to fulfill spatial constraints all make generative modeling for 3D scene synthesis and arrangement challenging. Current methods address these challenges autoregressively or by using off-the-shelf diffusion objectives by simultaneously predicting all attributes without 3D reasoning considerations. In this paper, we introduce DeBaRA, a score-based model specifically tailored for precise, controllable and flexible arrangement generation in a bounded environment. We argue that the most critical component of a scene synthesis system is to accurately establish the size and position of various objects within a restricted area. Based on this insight, we propose a lightweight conditional score-based model designed with 3D spatial awareness at its core. We demonstrate that by focusing on spatial attributes of objects, a single trained DeBaRA model can be leveraged at test time to perform several downstream applications such as scene synthesis, completion and re-arrangement. Further, we introduce a novel Self Score Evaluation procedure so it can be optimally employed alongside external LLM models. We evaluate our approach through extensive experiments and demonstrate significant improvement upon state-of-the-art approaches in a range of scenarios.
翻译:生成真实且多样化的带家具室内三维场景布局,为影响广泛行业的多种交互应用开辟了可能性。物体交互的固有复杂性、可用数据量的有限性以及满足空间约束的要求,都使得用于三维场景合成与布局的生成建模具有挑战性。现有方法以自回归方式或使用现成的扩散目标来处理这些挑战,它们同时预测所有属性而未考虑三维推理。本文提出DeBaRA,一种专门为在有界环境中实现精确、可控且灵活的布局生成而设计的基于分数的模型。我们认为,场景合成系统最关键的组成部分是准确确定受限区域内各类物体的大小和位置。基于这一见解,我们提出了一种轻量级的条件化基于分数的模型,其核心设计具备三维空间感知能力。我们证明,通过聚焦于物体的空间属性,在测试时可以利用单个训练好的DeBaRA模型来执行多种下游应用,例如场景合成、补全与重新排列。此外,我们引入了一种新颖的自分数评估流程,使其能够与外部LLM模型协同实现最优使用。我们通过大量实验评估了我们的方法,并证明在一系列场景中相比现有最先进方法有显著提升。