This work introduces a new task of instance-incremental scene graph generation: Given a scene of the point cloud, representing it as a graph and automatically increasing novel instances. A graph denoting the object layout of the scene is finally generated. It is an important task since it helps to guide the insertion of novel 3D objects into a real-world scene in vision-based applications like augmented reality. It is also challenging because the complexity of the real-world point cloud brings difficulties in learning object layout experiences from the observation data (non-empty rooms with labeled semantics). We model this task as a conditional generation problem and propose a 3D autoregressive framework based on normalizing flows (3D-ANF) to address it. First, we represent the point cloud as a graph by extracting the label semantics and contextual relationships. Next, a model based on normalizing flows is introduced to map the conditional generation of graphic elements into the Gaussian process. The mapping is invertible. Thus, the real-world experiences represented in the observation data can be modeled in the training phase, and novel instances can be autoregressively generated based on the Gaussian process in the testing phase. To evaluate the performance of our method sufficiently, we implement this new task on the indoor benchmark dataset 3DSSG-O27R16 and our newly proposed graphical dataset of outdoor scenes GPL3D. Experiments show that our method generates reliable novel graphs from the real-world point cloud and achieves state-of-the-art performance on the datasets.
翻译:本文提出了实例增量场景图生成这一新任务:给定点云场景,将其表示为图结构并自动增加新实例,最终生成描述场景中物体布局的图。该任务具有重要意义,因为它能指导在增强现实等视觉应用中向真实场景插入新3D物体。同时该任务也具有挑战性,因为真实点云的复杂性导致难以从观测数据(带有语义标注的非空房间)中学习物体布局经验。我们将该任务建模为条件生成问题,并提出基于归一化流的3D自回归框架(3D-ANF)来解决。首先,通过提取标签语义和上下文关系,将点云表示为图结构。其次,引入基于归一化流的模型,将图元条件生成映射为高斯过程。该映射是可逆的,因此训练阶段能够建模观测数据中的真实经验,测试阶段则可基于高斯过程自回归生成新实例。为充分评估方法性能,我们在室内基准数据集3DSSG-O27R16和新建的室外场景图数据集GPL3D上实现了该新任务。实验表明,我们的方法能从真实点云中生成可靠的新颖图结构,并在数据集上达到最优性能。