3D Semantic Scene Graph Prediction aims to detect objects and their semantic relationships in 3D scenes, and has emerged as a crucial technology for robotics and AR/VR applications. While previous research has addressed dataset limitations and explored various approaches including Open-Vocabulary settings, they frequently fail to optimize the representational capacity of object and relationship features, showing excessive reliance on Graph Neural Networks despite insufficient discriminative capability. In this work, we demonstrate through extensive analysis that the quality of object features plays a critical role in determining overall scene graph accuracy. To address this challenge, we design a highly discriminative object feature encoder and employ a contrastive pretraining strategy that decouples object representation learning from the scene graph prediction. This design not only enhances object classification accuracy but also yields direct improvements in relationship prediction. Notably, when plugging in our pretrained encoder into existing frameworks, we observe substantial performance improvements across all evaluation metrics. Additionally, whereas existing approaches have not fully exploited the integration of relationship information, we effectively combine both geometric and semantic features to achieve superior relationship prediction. Comprehensive experiments on the 3DSSG dataset demonstrate that our approach significantly outperforms previous state-of-the-art methods. Our code is publicly available at https://github.com/VisualScienceLab-KHU/OCRL-3DSSG-Codes.
翻译:三维语义场景图预测旨在检测三维场景中的对象及其语义关系,已成为机器人与增强现实/虚拟现实应用中的关键技术。尽管先前研究已针对数据集局限性展开探讨,并探索了包括开放词汇表设置在内的多种方法,但这些方法往往未能优化对象与关系特征的表示能力,表现出对图神经网络过度依赖而判别能力不足的问题。在本研究中,我们通过深入分析证明对象特征的质量对整体场景图预测精度具有决定性影响。为应对这一挑战,我们设计了一种高判别性的对象特征编码器,并采用对比预训练策略将对象表征学习与场景图预测任务解耦。该设计不仅提升了对象分类精度,还直接改善了关系预测性能。值得注意的是,将我们预训练的编码器嵌入现有框架后,所有评估指标均呈现显著性能提升。此外,现有方法尚未充分挖掘关系信息的整合潜力,我们通过有效融合几何特征与语义特征实现了更优的关系预测。在3DSSG数据集上的综合实验表明,本方法显著超越了现有最优方法。相关代码已公开于https://github.com/VisualScienceLab-KHU/OCRL-3DSSG-Codes。