Large language models (LLMs) are being used in data science code generation tasks, but they often struggle with complex sequential tasks, leading to logical errors. Their application to geospatial data processing is particularly challenging due to difficulties in incorporating complex data structures and spatial constraints, effectively utilizing diverse function calls, and the tendency to hallucinate less-used geospatial libraries. To tackle these problems, we introduce GeoAgent, a new interactive framework designed to help LLMs handle geospatial data processing more effectively. GeoAgent pioneers the integration of a code interpreter, static analysis, and Retrieval-Augmented Generation (RAG) techniques within a Monte Carlo Tree Search (MCTS) algorithm, offering a novel approach to geospatial data processing. In addition, we contribute a new benchmark specifically designed to evaluate the LLM-based approach in geospatial tasks. This benchmark leverages a variety of Python libraries and includes both single-turn and multi-turn tasks such as data acquisition, data analysis, and visualization. By offering a comprehensive evaluation among diverse geospatial contexts, this benchmark sets a new standard for developing LLM-based approaches in geospatial data analysis tasks. Our findings suggest that relying solely on knowledge of LLM is insufficient for accurate geospatial task programming, which requires coherent multi-step processes and multiple function calls. Compared to the baseline LLMs, the proposed GeoAgent has demonstrated superior performance, yielding notable improvements in function calls and task completion. In addition, these results offer valuable insights for the future development of LLM agents in automatic geospatial data analysis task programming.
翻译:大型语言模型(LLM)正被用于数据科学代码生成任务,但其在处理复杂序列任务时常常遇到困难,导致逻辑错误。由于难以整合复杂的数据结构与空间约束、有效利用多样化的函数调用,以及倾向于幻觉化使用较少的地理空间库,LLM在地理空间数据处理中的应用尤为困难。为应对这些问题,我们提出了GeoAgent——一种新颖的交互式框架,旨在帮助LLM更有效地处理地理空间数据。GeoAgent率先将代码解释器、静态分析与检索增强生成(RAG)技术集成于蒙特卡洛树搜索(MCTS)算法中,为地理空间数据处理提供了一种创新方法。此外,我们贡献了一个专门用于评估基于LLM方法在地理空间任务中表现的新基准。该基准利用了多种Python库,并包含数据获取、数据分析和可视化等单轮及多轮任务。通过提供跨多样地理空间场景的全面评估,该基准为开发基于LLM的地理空间数据分析方法设立了新标准。我们的研究结果表明,仅依赖LLM的知识不足以实现准确的地理空间任务编程,这类任务需要连贯的多步骤流程与多次函数调用。相较于基线LLM,所提出的GeoAgent展现出优越的性能,在函数调用与任务完成度方面取得了显著提升。此外,这些结果为未来开发面向自动地理空间数据分析任务编程的LLM智能体提供了宝贵的洞见。