This work introduces a new task of instance-incremental scene graph generation: Given an empty room of the point cloud, representing it as a graph and automatically increasing novel instances. A graph denoting the object layout of the scene is finally generated. It is an important task since it helps to guide the insertion of novel 3D objects into a real-world scene in vision-based applications like augmented reality. It is also challenging because the complexity of the real-world point cloud brings difficulties in learning object layout experiences from the observation data (non-empty rooms with labeled semantics). We model this task as a conditional generation problem and propose a 3D autoregressive framework based on normalizing flows (3D-ANF) to address it. We first represent the point cloud as a graph by extracting the containing label semantics and contextual relationships. Next, a model based on normalizing flows is introduced to map the conditional generation of graphic elements into the Gaussian process. The mapping is invertible. Thus, the real-world experiences represented in the observation data can be modeled in the training phase, and novel instances can be sequentially generated based on the Gaussian process in the testing phase. We implement this new task on the dataset of 3D point-based scenes (3DSSG and 3RScan) and evaluate the performance of our method. Experiments show that our method generates reliable novel graphs from the real-world point cloud and achieves state-of-the-art performance on the benchmark dataset.
翻译:本文提出了一项新任务:实例增量场景图生成。该任务以空房间的点云作为输入,将其表示为图结构,并自动增加新实例,最终生成描述场景中物体布局的图。该任务具有重要意义,因为它有助于在增强现实等基于视觉的应用中引导真实场景中新增3D物体的插入。同时,该任务也具有挑战性,因为真实世界点云的复杂性增加了从观测数据(具有标注语义的非空房间)中学习物体布局经验的难度。我们将该任务建模为条件生成问题,并提出了一种基于标准化流的3D自回归框架(3D-ANF)来解决。首先,我们通过提取点云中蕴含的标签语义和上下文关系,将其表示为图结构。接着,引入基于标准化流的模型,将图元素的条件生成映射到高斯过程。该映射是可逆的,因此训练阶段可对观测数据中的真实世界经验进行建模,测试阶段则能基于高斯过程顺序生成新实例。我们在3D点云场景数据集(3DSSG和3RScan)上实现了该新任务,并评估了所提方法的性能。实验表明,我们的方法能从真实世界点云中生成可靠的新图,并在基准数据集上达到了最优性能。