Vision Language Models Can Parse Floor Plan Maps - 专知论文

会员服务 ·

0

语言模型 · 视觉语言模型 · 问答 · 标注 · 标注信息 ·

2025 年 11 月 24 日

Vision Language Models Can Parse Floor Plan Maps

翻译：视觉语言模型能够解析平面布局图

David DeFazio,Hrudayangam Mehta,Meng Wang,Ping Yang,Jeremy Blackburn,Shiqi Zhang

Vision language models (VLMs) can simultaneously reason about images and texts to tackle many tasks, from visual question answering to image captioning. This paper focuses on map parsing, a novel task that is unexplored within the VLM context and particularly useful to mobile robots. Map parsing requires understanding not only the labels but also the geometric configurations of a map, i.e., what areas are like and how they are connected. To evaluate the performance of VLMs on map parsing, we prompt VLMs with floor plan maps to generate task plans for complex indoor navigation. Our results demonstrate the remarkable capability of VLMs in map parsing, with a success rate of 0.96 in tasks requiring a sequence of nine navigation actions, e.g., approaching and going through doors. Other than intuitive observations, e.g., VLMs do better in smaller maps and simpler navigation tasks, there was a very interesting observation that its performance drops in large open areas. We provide practical suggestions to address such challenges as validated by our experimental results. Webpage: https://sites.google.com/view/vlm-floorplan/

翻译：视觉语言模型（VLMs）能够同时对图像和文本进行推理，以处理从视觉问答到图像描述生成等多种任务。本文聚焦于地图解析这一在VLM背景下尚未被探索、且对移动机器人尤为有用的新任务。地图解析不仅需要理解地图的标注信息，还需理解其几何构型，即区域的特性及其相互连接方式。为评估VLMs在地图解析上的性能，我们通过向VLMs输入平面布局图来生成复杂室内导航的任务规划。实验结果表明，VLMs在地图解析方面展现出卓越能力，在需要连续执行九步导航动作（例如接近并穿过门）的任务中取得了0.96的成功率。除了一些直观的观察结论（例如VLMs在较小地图和较简单导航任务中表现更好），我们还发现一个非常有趣的现象：其在大型开放区域中的性能会下降。我们基于实验结果验证，提出了应对此类挑战的实用建议。项目网页：https://sites.google.com/view/vlm-floorplan/

0

相关内容

语言模型

机器人领域中最佳的三维场景表示是什么？——从几何表示到基础模型

机器人领域中最佳的三维场景表示是什么？——从几何表示到基础模型

专知会员服务

16+阅读 · 2025年12月4日

DeepSeek模型综述：V1 V2 V3 R1-Zero

DeepSeek模型综述：V1 V2 V3 R1-Zero

专知会员服务

116+阅读 · 2025年2月11日

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

24+阅读 · 2023年5月10日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

【干货书】1270页pdf《概率图模型：原理与技术》MIT出版，用于构建和使用复杂系统概率模型的通用框架，使计算机能够使用可用信息做出决策。Probabilistic Graphical Models: Principles and Applications

【干货书】1270页pdf《概率图模型：原理与技术》MIT出版，用于构建和使用复杂系统概率模型的通用框架，使计算机能够使用可用信息做出决策。Probabilistic Graphical Models: Principles and Applications

专知会员服务

47+阅读 · 2022年4月11日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

专知会员服务

98+阅读 · 2019年12月31日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知会员服务

112+阅读 · 2019年11月25日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

对话管理的综述论文:最近的进展和挑战，A Survey on Dialog Management

对话管理的综述论文:最近的进展和挑战，A Survey on Dialog Management

专知

12+阅读 · 2020年5月14日

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

开放知识图谱

14+阅读 · 2020年4月8日

【CVPR2020-清华大学】渐进对抗网络的细粒度域适应，Progressive Adversarial Networks

【CVPR2020-清华大学】渐进对抗网络的细粒度域适应，Progressive Adversarial Networks

专知

31+阅读 · 2020年4月4日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

图像和文本的融合表示学习——Text2Image和Image2Text

图像和文本的融合表示学习——Text2Image和Image2Text

专知

125+阅读 · 2018年6月11日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

“自然语言-草图”耦合的地理场景查询方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于深层特征学习的RGB-D人体行为识别方法

国家自然科学基金

4+阅读 · 2015年12月31日

“模块化自组装”DNA计算模型的研究

国家自然科学基金

3+阅读 · 2015年12月31日

2D/3D视觉信息融合仿生SLAM关键问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

“数据-知识”驱动的大区域高分辨率遥感影像多尺度分割并行计算方法

国家自然科学基金

0+阅读 · 2015年12月31日

基于稀疏表达理论和RGBD图像的人脸表情识别

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

基于上下文感知和异质特征集成的SAR图像分割与评价

国家自然科学基金

2+阅读 · 2015年12月31日

Web页面数据对象的感知理解与计算

国家自然科学基金

0+阅读 · 2014年12月31日

面向时空变化的GIS数据模型

国家自然科学基金

6+阅读 · 2014年12月31日

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

Is ChatGPT a Good Recommender? A Preliminary Study

Arxiv

176+阅读 · 2023年4月20日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

43+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

111+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

232+阅读 · 2023年4月7日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

88+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

501+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

156+阅读 · 2023年3月29日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

64+阅读 · 2023年3月29日

Nature Language Reasoning, A Survey

Arxiv

83+阅读 · 2023年3月26日

VIP会员

文章信息

相关主题

视觉语言模型

最新内容

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

专知会员服务

7+阅读 · 7月18日

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

专知会员服务

5+阅读 · 7月18日

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

专知会员服务

6+阅读 · 7月18日

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

专知会员服务

4+阅读 · 7月18日

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

专知会员服务

8+阅读 · 7月17日

《边缘端实时无线感知赋能现场多机器人部署》200页

《边缘端实时无线感知赋能现场多机器人部署》200页

专知会员服务

7+阅读 · 7月17日

战力倍增器：自主武器系统与乌克兰及加沙冲突

战力倍增器：自主武器系统与乌克兰及加沙冲突

专知会员服务

4+阅读 · 7月17日

人工智能赋能战场情报：提速决策进程

人工智能赋能战场情报：提速决策进程

专知会员服务

2+阅读 · 7月17日

《拥抱新兴技术：面向未来军官的教育革新》

《拥抱新兴技术：面向未来军官的教育革新》

专知会员服务

5+阅读 · 7月17日

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

专知会员服务

3+阅读 · 7月17日

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

专知会员服务

4+阅读 · 7月17日

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

专知会员服务

12+阅读 · 7月16日

《无人地面战车（UGV）的崛起》报告

《无人地面战车（UGV）的崛起》报告

专知会员服务

7+阅读 · 7月16日

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

专知会员服务

6+阅读 · 7月16日

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

专知会员服务

14+阅读 · 7月16日

相关VIP内容

机器人领域中最佳的三维场景表示是什么？——从几何表示到基础模型

机器人领域中最佳的三维场景表示是什么？——从几何表示到基础模型

专知会员服务

16+阅读 · 2025年12月4日

DeepSeek模型综述：V1 V2 V3 R1-Zero

DeepSeek模型综述：V1 V2 V3 R1-Zero

专知会员服务

116+阅读 · 2025年2月11日

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

24+阅读 · 2023年5月10日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

【干货书】1270页pdf《概率图模型：原理与技术》MIT出版，用于构建和使用复杂系统概率模型的通用框架，使计算机能够使用可用信息做出决策。Probabilistic Graphical Models: Principles and Applications

【干货书】1270页pdf《概率图模型：原理与技术》MIT出版，用于构建和使用复杂系统概率模型的通用框架，使计算机能够使用可用信息做出决策。Probabilistic Graphical Models: Principles and Applications

专知会员服务

47+阅读 · 2022年4月11日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

专知会员服务

98+阅读 · 2019年12月31日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知会员服务

112+阅读 · 2019年11月25日

热门VIP内容

开通专知VIP会员享更多权益服务

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

相关资讯

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

对话管理的综述论文:最近的进展和挑战，A Survey on Dialog Management

对话管理的综述论文:最近的进展和挑战，A Survey on Dialog Management

专知

12+阅读 · 2020年5月14日

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

开放知识图谱

14+阅读 · 2020年4月8日

【CVPR2020-清华大学】渐进对抗网络的细粒度域适应，Progressive Adversarial Networks

【CVPR2020-清华大学】渐进对抗网络的细粒度域适应，Progressive Adversarial Networks

专知

31+阅读 · 2020年4月4日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

图像和文本的融合表示学习——Text2Image和Image2Text

图像和文本的融合表示学习——Text2Image和Image2Text

专知

125+阅读 · 2018年6月11日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

相关论文

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

Is ChatGPT a Good Recommender? A Preliminary Study

Arxiv

176+阅读 · 2023年4月20日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

43+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

111+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

232+阅读 · 2023年4月7日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

88+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

501+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

156+阅读 · 2023年3月29日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

64+阅读 · 2023年3月29日

Nature Language Reasoning, A Survey

Arxiv

83+阅读 · 2023年3月26日

相关基金

“自然语言-草图”耦合的地理场景查询方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于深层特征学习的RGB-D人体行为识别方法

国家自然科学基金

4+阅读 · 2015年12月31日

“模块化自组装”DNA计算模型的研究

国家自然科学基金

3+阅读 · 2015年12月31日

2D/3D视觉信息融合仿生SLAM关键问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

“数据-知识”驱动的大区域高分辨率遥感影像多尺度分割并行计算方法

国家自然科学基金

0+阅读 · 2015年12月31日

基于稀疏表达理论和RGBD图像的人脸表情识别

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

基于上下文感知和异质特征集成的SAR图像分割与评价

国家自然科学基金

2+阅读 · 2015年12月31日

Web页面数据对象的感知理解与计算

国家自然科学基金

0+阅读 · 2014年12月31日

面向时空变化的GIS数据模型

国家自然科学基金

6+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员