3D scene graph prediction is a task that aims to concurrently predict object classes and their relationships within a 3D environment. As these environments are primarily designed by and for humans, incorporating commonsense knowledge regarding objects and their relationships can significantly constrain and enhance the prediction of the scene graph. In this paper, we investigate the application of commonsense knowledge graphs for 3D scene graph prediction on point clouds of indoor scenes. Through experiments conducted on a real-world indoor dataset, we demonstrate that integrating external commonsense knowledge via the message-passing method leads to a 15.0 % improvement in scene graph prediction accuracy with external knowledge and $7.96\%$ with internal knowledge when compared to state-of-the-art algorithms. We also tested in the real world with 10 frames per second for scene graph generation to show the usage of the model in a more realistic robotics setting.
翻译:三维场景图预测是一项旨在同时预测三维环境中物体类别及其关系的任务。由于这些环境主要由人类设计并为人类服务,融入关于物体及其关系的常识知识可以显著约束并增强场景图的预测效果。本文研究了在室内场景点云中应用常识知识图谱进行三维场景图预测的方法。通过在真实室内数据集上的实验,我们证明了通过消息传递方法整合外部常识知识,相较于最先进算法,使用外部知识时场景图预测准确率提升了15.0%,使用内部知识时提升了7.96%。此外,我们在真实世界中以每秒10帧的场景图生成速率进行测试,以展示该模型在更真实的机器人应用场景中的使用潜力。