GeoDecoder: Empowering Multimodal Map Understanding

This paper presents GeoDecoder, a dedicated multimodal model designed for processing geospatial information in maps. Built on the BeitGPT architecture, GeoDecoder incorporates specialized expert modules for image and text processing. On the image side, GeoDecoder utilizes GaoDe Amap as the underlying base map, which inherently encompasses essential details about road and building shapes, relative positions, and other attributes. Through the utilization of rendering techniques, the model seamlessly integrates external data and features such as symbol markers, drive trajectories, heatmaps, and user-defined markers, eliminating the need for extra feature engineering. The text module of GeoDecoder accepts various context texts and question prompts, generating text outputs in the style of GPT. Furthermore, the GPT-based model allows for the training and execution of multiple tasks within the same model in an end-to-end manner. To enhance map cognition and enable GeoDecoder to acquire knowledge about the distribution of geographic entities in Beijing, we devised eight fundamental geospatial tasks and conducted pretraining of the model using large-scale text-image samples. Subsequently, rapid fine-tuning was performed on three downstream tasks, resulting in significant performance improvements. The GeoDecoder model demonstrates a comprehensive understanding of map elements and their associated operations, enabling efficient and high-quality application of diverse geospatial tasks in different business scenarios.

翻译：本文提出GeoDecoder，一种专用于处理地图中地理空间信息的专用多模态模型。该模型基于BeitGPT架构，集成了用于图像和文本处理的专业专家模块。在图像层面，GeoDecoder采用高德地图（Amap）作为底层底图，该底图天然包含了道路、建筑形状、相对位置及其他属性的关键细节。通过利用渲染技术，模型能够无缝整合符号标记、行车轨迹、热力图及用户自定义标记等外部数据与特征，无需额外特征工程。GeoDecoder的文本模块可接收多种上下文文本和问题提示，以GPT风格生成文本输出。此外，基于GPT的模型支持以端到端方式在同一模型中训练和执行多项任务。为增强地图认知能力并使GeoDecoder获取北京地理实体分布知识，我们设计了八项基础地理空间任务，并利用大规模图文样本对模型进行预训练。随后在三个下游任务上进行了快速微调，取得了显著性能提升。GeoDecoder模型展现出对地图元素及其相关操作的全面理解能力，可在不同业务场景中高效高质量地应用各类地理空间任务。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日