This paper proposes an approach to build 3D scene graphs in arbitrary indoor and outdoor environments. Such extension is challenging; the hierarchy of concepts that describe an outdoor environment is more complex than for indoors, and manually defining such hierarchy is time-consuming and does not scale. Furthermore, the lack of training data prevents the straightforward application of learning-based tools used in indoor settings. To address these challenges, we propose two novel extensions. First, we develop methods to build a spatial ontology defining concepts and relations relevant for indoor and outdoor robot operation. In particular, we use a Large Language Model (LLM) to build such an ontology, thus largely reducing the amount of manual effort required. Second, we leverage the spatial ontology for 3D scene graph construction using Logic Tensor Networks (LTN) to add logical rules, or axioms (e.g., "a beach contains sand"), which provide additional supervisory signals at training time thus reducing the need for labelled data, providing better predictions, and even allowing predicting concepts unseen at training time. We test our approach in a variety of datasets, including indoor, rural, and coastal environments, and show that it leads to a significant increase in the quality of the 3D scene graph generation with sparsely annotated data.
翻译:本文提出一种在任意室内外环境中构建3D场景图的方法。此类扩展极具挑战性:描述室外环境的概念层级结构比室内更为复杂,而人工定义此类层级既耗时又难以扩展。此外,训练数据的匮乏阻碍了室内场景中常用学习工具的直接应用。为应对这些挑战,我们提出两项创新性扩展。首先,我们开发了构建空间本体的方法,该本体定义了与室内外机器人操作相关的概念及关系。具体而言,我们利用大语言模型(LLM)构建此类本体,从而大幅减少所需人工投入。其次,我们借助逻辑张量网络(LTN)将空间本体应用于3D场景图构建,通过添加逻辑规则(即公理,如“海滩包含沙子”),在训练阶段提供额外监督信号,从而减少对标注数据的需求、提升预测质量,甚至能够预测训练时未见的概念。我们在包含室内、乡村及沿海环境的多类数据集上进行了测试,结果表明,该方法在稀疏标注数据条件下显著提升了3D场景图生成的质量。