Large language models (LLMs) are being used in data science code generation tasks, but they often struggle with complex sequential tasks, leading to logical errors. Their application to geospatial data processing is particularly challenging due to difficulties in incorporating complex data structures and spatial constraints, effectively utilizing diverse function calls, and the tendency to hallucinate less-used geospatial libraries. To tackle these problems, we introduce GeoAgent, a new interactive framework designed to help LLMs handle geospatial data processing more effectively. GeoAgent pioneers the integration of a code interpreter, static analysis, and Retrieval-Augmented Generation (RAG) techniques within a Monte Carlo Tree Search (MCTS) algorithm, offering a novel approach to geospatial data processing. In addition, we contribute a new benchmark specifically designed to evaluate the LLM-based approach in geospatial tasks. This benchmark leverages a variety of Python libraries and includes both single-turn and multi-turn tasks such as data acquisition, data analysis, and visualization. By offering a comprehensive evaluation among diverse geospatial contexts, this benchmark sets a new standard for developing LLM-based approaches in geospatial data analysis tasks. Our findings suggest that relying solely on knowledge of LLM is insufficient for accurate geospatial task programming, which requires coherent multi-step processes and multiple function calls. Compared to the baseline LLMs, the proposed GeoAgent has demonstrated superior performance, yielding notable improvements in function calls and task completion. In addition, these results offer valuable insights for the future development of LLM agents in automatic geospatial data analysis task programming.
翻译:大语言模型(LLMs)正被应用于数据科学代码生成任务,但其在处理复杂序列任务时常常存在困难,导致逻辑错误。由于难以整合复杂数据结构与空间约束、有效利用多样化函数调用,以及倾向于错误调用使用频率较低的地理空间库,将其应用于地理空间数据处理尤为困难。为解决这些问题,我们提出了GeoAgent——一种新颖的交互式框架,旨在帮助大语言模型更有效地处理地理空间数据。GeoAgent开创性地将代码解释器、静态分析与检索增强生成(RAG)技术集成于蒙特卡洛树搜索(MCTS)算法中,为地理空间数据处理提供了全新方法。此外,我们贡献了一个专门用于评估基于大语言模型方法在地理空间任务中表现的新基准。该基准利用了多种Python库,并包含数据获取、数据分析和可视化等单轮及多轮任务。通过提供跨多样化地理空间场景的综合评估,该基准为开发基于大语言模型的地理空间数据分析方法设立了新标准。我们的研究结果表明,仅依赖大语言模型的知识不足以实现准确的地理空间任务编程,此类编程需要连贯的多步骤流程和多重函数调用。相较于基线大语言模型,所提出的GeoAgent展现出更优越的性能,在函数调用和任务完成度方面均取得显著提升。此外,这些结果为未来开发面向地理空间数据自动分析任务编程的大语言模型智能体提供了重要启示。