This paper presents GeoDecoder, a dedicated multimodal model designed for processing geospatial information in maps. Built on the BeitGPT architecture, GeoDecoder incorporates specialized expert modules for image and text processing. On the image side, GeoDecoder utilizes GaoDe Amap as the underlying base map, which inherently encompasses essential details about road and building shapes, relative positions, and other attributes. Through the utilization of rendering techniques, the model seamlessly integrates external data and features such as symbol markers, drive trajectories, heatmaps, and user-defined markers, eliminating the need for extra feature engineering. The text module of GeoDecoder accepts various context texts and question prompts, generating text outputs in the style of GPT. Furthermore, the GPT-based model allows for the training and execution of multiple tasks within the same model in an end-to-end manner. To enhance map cognition and enable GeoDecoder to acquire knowledge about the distribution of geographic entities in Beijing, we devised eight fundamental geospatial tasks and conducted pretraining of the model using large-scale text-image samples. Subsequently, rapid fine-tuning was performed on three downstream tasks, resulting in significant performance improvements. The GeoDecoder model demonstrates a comprehensive understanding of map elements and their associated operations, enabling efficient and high-quality application of diverse geospatial tasks in different business scenarios.
翻译:本文提出GeoDecoder,一种专用于处理地图中地理空间信息的专用多模态模型。该模型基于BeitGPT架构,集成了用于图像和文本处理的专业专家模块。在图像层面,GeoDecoder采用高德地图(Amap)作为底层底图,该底图天然包含了道路、建筑形状、相对位置及其他属性的关键细节。通过利用渲染技术,模型能够无缝整合符号标记、行车轨迹、热力图及用户自定义标记等外部数据与特征,无需额外特征工程。GeoDecoder的文本模块可接收多种上下文文本和问题提示,以GPT风格生成文本输出。此外,基于GPT的模型支持以端到端方式在同一模型中训练和执行多项任务。为增强地图认知能力并使GeoDecoder获取北京地理实体分布知识,我们设计了八项基础地理空间任务,并利用大规模图文样本对模型进行预训练。随后在三个下游任务上进行了快速微调,取得了显著性能提升。GeoDecoder模型展现出对地图元素及其相关操作的全面理解能力,可在不同业务场景中高效高质量地应用各类地理空间任务。