From Pixels to Concepts: Growing Rich 3D Semantic Scene Graph Forests utilizing Foundation Models - 专知论文

会员服务 ·

0

3D · 图 · MoDELS · 回合 · 可理解性 ·

From Pixels to Concepts: Growing Rich 3D Semantic Scene Graph Forests utilizing Foundation Models

翻译：暂无翻译

David Oberacker,Meike Deitersen,Niklas Spielbauer,Tristan Schnell,Georg Heppner,Arne Roennau

from arxiv, To be published in the Proceedings of the IEEE/RSJ International Conference on Intelligent Robots & Systems (IEEE IROS 2026)

Operating in complex real-world environments requires robots to understand their surroundings on a functional semantic level. This demands a detailed multi-layer world model capturing the complex relations of its surroundings. Hierarchical 3D scene graphs address this challenge by integrating geometric, semantic, and relational data within a unified spatial framework. However, current 3D scene graph approaches often restrict themselves to rigid structures of pre-determined relationship classes, mostly neglecting important semantic connections, like causal connections or environmental contexts. This paper explores the potential of foundation models to build forests of 3D scene graphs with open semantic relationships to improve scene understanding and robotic task execution. We propose a method where instance-specific concept-nodes and relationships are first identified by a VLM and extended upon by a LLM, inferring broader, more abstract concept-nodes and relationships through reasoning. These object-nodes, concept-nodes, and relationships are then assembled into a forest of hierarchical 3D scene graphs, enhanced with concept-nodes to represent abstract concepts. Evaluations were conducted on the uHumans2 and ScanNet indoor dataset, validating the accuracy and relevance of the generated relationships. Downstream suitability of scene-graph forests for robotics applications is demonstrated in an open-vocabulary object-retrieval task utilizing both ScanNet data and a real-world indoor deployment using a Boston Dynamics Spot. This paper leverages foundation models to create more expressive, semantically deep 3D hierarchical scene graphs and demonstrates their potential to advance semantic and environmental understanding in robotics.

翻译：暂无翻译

0

相关内容

3D是英文“Three Dimensions”的简称，中文是指三维、三个维度、三个坐标，即有长、有宽、有高，换句话说，就是立体的，是相对于只有长和宽的平面（2D）而言。

【综述】世界模型：架构、方法、推理与应用全景

【综述】世界模型：架构、方法、推理与应用全景

专知会员服务

30+阅读 · 6月2日

智能体化世界建模：基础、能力、规律及展望

智能体化世界建模：基础、能力、规律及展望

专知会员服务

23+阅读 · 4月28日

《面向人机协作的扩展型信念-愿望-意图模型》最新111页

《面向人机协作的扩展型信念-愿望-意图模型》最新111页

专知会员服务

36+阅读 · 2025年7月28日

3D点云基础模型：综述与展望

3D点云基础模型：综述与展望

专知会员服务

17+阅读 · 2025年1月31日

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CMU博士论文】非结构化环境中的多模态导航学习，177页pdf

【CMU博士论文】非结构化环境中的多模态导航学习，177页pdf

专知会员服务

49+阅读 · 2022年12月8日

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

专知会员服务

11+阅读 · 2022年3月12日

【CVPR 2022】paper解读——从头盔信号中解析生成3D姿势，这为AR/VR创造可信虚拟形象迈出了重要一步，FLAG: Flow-based 3D Avatar Generation from Sparse Observations

专知会员服务

19+阅读 · 2022年3月6日

【KDD2020】现实世界超图的结构模式和生成模型，Structural Patterns and Generative Models of Real-world Hypergraphs

【KDD2020】现实世界超图的结构模式和生成模型，Structural Patterns and Generative Models of Real-world Hypergraphs

专知会员服务

37+阅读 · 2020年6月16日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

【泡泡图灵智库】ContextDesc：用跨模态上下文增强的局部描述子

【泡泡图灵智库】ContextDesc：用跨模态上下文增强的局部描述子

泡泡机器人SLAM

34+阅读 · 2019年9月18日

【泡泡图灵智库】使用语义特征优化全景影像序列与移动激光点云的自动配准

【泡泡图灵智库】使用语义特征优化全景影像序列与移动激光点云的自动配准

泡泡机器人SLAM

10+阅读 · 2019年9月15日

【泡泡图灵智库】体积实例感知语义建图与3D对象发现

【泡泡图灵智库】体积实例感知语义建图与3D对象发现

泡泡机器人SLAM

22+阅读 · 2019年9月7日

如何找到相似Graph？DeepMind提出超越GNN的图匹配网络

如何找到相似Graph？DeepMind提出超越GNN的图匹配网络

机器之心

24+阅读 · 2019年5月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Image Captioning 36页最新综述， 161篇参考文献

Image Captioning 36页最新综述， 161篇参考文献

专知

90+阅读 · 2018年10月23日

Fully-Convolutional Siamese Networks for Object Tracking论文笔记

Fully-Convolutional Siamese Networks for Object Tracking论文笔记

统计学习与视觉计算组

10+阅读 · 2018年10月12日

图像检索研究进展：浅层、深层特征及特征融合

图像检索研究进展：浅层、深层特征及特征融合

机器学习研究会

65+阅读 · 2018年3月26日

Github 项目推荐 | 真实全景图像强化学习 AI 平台 —— Matterport3DSimulator

Github 项目推荐 | 真实全景图像强化学习 AI 平台 —— Matterport3DSimulator

AI研习社

10+阅读 · 2018年3月6日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

动态环境下的实时高清大规模三维地形重建研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于三维激光测距的移动机器人室外环境语义地图构建

国家自然科学基金

2+阅读 · 2015年12月31日

野外环境下四足机器人地形辨识与可通过性评价方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

GNSS仿真模型服务化共享关键技术研究

国家自然科学基金

9+阅读 · 2015年12月31日

面向无人机基于在线场景建模的室外目标检测与跟踪方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于图像的植物种类识别与植物三维建模

国家自然科学基金

3+阅读 · 2015年12月31日

基于地面激光雷达的森林结构参数提取和真实景观三维建模整合性研究

国家自然科学基金

0+阅读 · 2014年12月31日

形状先验和数据驱动的高分辨遥感影像目标提取

国家自然科学基金

3+阅读 · 2014年12月31日

面向时空变化的GIS数据模型

国家自然科学基金

6+阅读 · 2014年12月31日

基于深度学习的特征融合在移动机器人视觉中的场景理解及研究

国家自然科学基金

12+阅读 · 2014年12月31日

ArtiTwinSplat: Interactable Digital Twin Reconstruction via Gaussian Splatting from RGB-D videos

Arxiv

0+阅读 · 6月23日

Experiments with Optimal Model Trees

Arxiv

0+阅读 · 6月23日

ObsGraph: Hierarchical Observation Representation for Embodied Reasoning and Exploration

Arxiv

0+阅读 · 6月23日

$φ$-Scene: Physically Grounded Image-to-3D Scene Reconstruction

Arxiv

0+阅读 · 6月19日

Decoupling the Declarative from the Procedural in Vision-Language-Action Models

Arxiv

0+阅读 · 6月19日

Build Once, Monitor Continuously: Persistent Semantic Mapping via Autonomous Exploration and Open-Vocabulary Object Updates

Arxiv

0+阅读 · 6月19日

Vesta: A Generalist Embodied Reasoning Model

Arxiv

0+阅读 · 6月18日

FlowMaps: Modeling Long-Term Multimodal Object Dynamics with Flow Matching

Arxiv

0+阅读 · 6月18日

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

Arxiv

0+阅读 · 6月17日

Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement

Arxiv

0+阅读 · 6月16日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

2+阅读 · 6月23日

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

4+阅读 · 6月23日

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

7+阅读 · 6月23日

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

3+阅读 · 6月23日

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

4+阅读 · 6月23日

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

7+阅读 · 6月23日

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

5+阅读 · 6月23日

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

3+阅读 · 6月23日

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

6+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

8+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

8+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

6+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

5+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

9+阅读 · 6月22日

相关VIP内容

【综述】世界模型：架构、方法、推理与应用全景

【综述】世界模型：架构、方法、推理与应用全景

专知会员服务

30+阅读 · 6月2日

智能体化世界建模：基础、能力、规律及展望

智能体化世界建模：基础、能力、规律及展望

专知会员服务

23+阅读 · 4月28日

《面向人机协作的扩展型信念-愿望-意图模型》最新111页

《面向人机协作的扩展型信念-愿望-意图模型》最新111页

专知会员服务

36+阅读 · 2025年7月28日

3D点云基础模型：综述与展望

3D点云基础模型：综述与展望

专知会员服务

17+阅读 · 2025年1月31日

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CMU博士论文】非结构化环境中的多模态导航学习，177页pdf

【CMU博士论文】非结构化环境中的多模态导航学习，177页pdf

专知会员服务

49+阅读 · 2022年12月8日

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

专知会员服务

11+阅读 · 2022年3月12日

【CVPR 2022】paper解读——从头盔信号中解析生成3D姿势，这为AR/VR创造可信虚拟形象迈出了重要一步，FLAG: Flow-based 3D Avatar Generation from Sparse Observations

专知会员服务

19+阅读 · 2022年3月6日

【KDD2020】现实世界超图的结构模式和生成模型，Structural Patterns and Generative Models of Real-world Hypergraphs

【KDD2020】现实世界超图的结构模式和生成模型，Structural Patterns and Generative Models of Real-world Hypergraphs

专知会员服务

37+阅读 · 2020年6月16日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 世界动作模型：少做梦，多行动

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

美以伊冲突：无人机与人工智能的运用

相关资讯

【泡泡图灵智库】ContextDesc：用跨模态上下文增强的局部描述子

【泡泡图灵智库】ContextDesc：用跨模态上下文增强的局部描述子

泡泡机器人SLAM

34+阅读 · 2019年9月18日

【泡泡图灵智库】使用语义特征优化全景影像序列与移动激光点云的自动配准

【泡泡图灵智库】使用语义特征优化全景影像序列与移动激光点云的自动配准

泡泡机器人SLAM

10+阅读 · 2019年9月15日

【泡泡图灵智库】体积实例感知语义建图与3D对象发现

【泡泡图灵智库】体积实例感知语义建图与3D对象发现

泡泡机器人SLAM

22+阅读 · 2019年9月7日

如何找到相似Graph？DeepMind提出超越GNN的图匹配网络

如何找到相似Graph？DeepMind提出超越GNN的图匹配网络

机器之心

24+阅读 · 2019年5月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Image Captioning 36页最新综述， 161篇参考文献

Image Captioning 36页最新综述， 161篇参考文献

专知

90+阅读 · 2018年10月23日

Fully-Convolutional Siamese Networks for Object Tracking论文笔记

Fully-Convolutional Siamese Networks for Object Tracking论文笔记

统计学习与视觉计算组

10+阅读 · 2018年10月12日

图像检索研究进展：浅层、深层特征及特征融合

图像检索研究进展：浅层、深层特征及特征融合

机器学习研究会

65+阅读 · 2018年3月26日

Github 项目推荐 | 真实全景图像强化学习 AI 平台 —— Matterport3DSimulator

Github 项目推荐 | 真实全景图像强化学习 AI 平台 —— Matterport3DSimulator

AI研习社

10+阅读 · 2018年3月6日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

ArtiTwinSplat: Interactable Digital Twin Reconstruction via Gaussian Splatting from RGB-D videos

Arxiv

0+阅读 · 6月23日

Experiments with Optimal Model Trees

Arxiv

0+阅读 · 6月23日

ObsGraph: Hierarchical Observation Representation for Embodied Reasoning and Exploration

Arxiv

0+阅读 · 6月23日

$φ$-Scene: Physically Grounded Image-to-3D Scene Reconstruction

Arxiv

0+阅读 · 6月19日

Decoupling the Declarative from the Procedural in Vision-Language-Action Models

Arxiv

0+阅读 · 6月19日

Build Once, Monitor Continuously: Persistent Semantic Mapping via Autonomous Exploration and Open-Vocabulary Object Updates

Arxiv

0+阅读 · 6月19日

Vesta: A Generalist Embodied Reasoning Model

Arxiv

0+阅读 · 6月18日

FlowMaps: Modeling Long-Term Multimodal Object Dynamics with Flow Matching

Arxiv

0+阅读 · 6月18日

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

Arxiv

0+阅读 · 6月17日

Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement

Arxiv

0+阅读 · 6月16日

相关基金

动态环境下的实时高清大规模三维地形重建研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于三维激光测距的移动机器人室外环境语义地图构建

国家自然科学基金

2+阅读 · 2015年12月31日

野外环境下四足机器人地形辨识与可通过性评价方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

GNSS仿真模型服务化共享关键技术研究

国家自然科学基金

9+阅读 · 2015年12月31日

面向无人机基于在线场景建模的室外目标检测与跟踪方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于图像的植物种类识别与植物三维建模

国家自然科学基金

3+阅读 · 2015年12月31日

基于地面激光雷达的森林结构参数提取和真实景观三维建模整合性研究

国家自然科学基金

0+阅读 · 2014年12月31日

形状先验和数据驱动的高分辨遥感影像目标提取

国家自然科学基金

3+阅读 · 2014年12月31日

面向时空变化的GIS数据模型

国家自然科学基金

6+阅读 · 2014年12月31日

基于深度学习的特征融合在移动机器人视觉中的场景理解及研究

国家自然科学基金

12+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员